Age | Commit message (Collapse) | Author |
|
* ggml : Q4_2 ARM
* ggml : add ggml_is_quantized()
* llama : update llama_type_name() with Q4_2 entry
* ggml : speed-up q4_2
- 4 threads: ~100ms -> ~90ms
- 8 threads: ~55ms -> ~50ms
* ggml : optimize q4_2 using vmlaq_n_f32 + vmulq_n_f32
|
|
|
|
|
|
* Revert 7e53955 (#542)
Still needs to be fixed properly
* Fix linking on mingw32
|
|
* Be more strict about converting float to double
* Test equivalence of round, SILU implementations
Test module is commented out in CMakeLists.txt because the tests may
take a long time, depending on how much the compiler optimizes.
* Fix softmax in perplexity.cpp
* all : prefer float over double where appropriate
* perplexity : add <cmath>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* Introduce structs for the q4 data blocks
* ggml : rename quant struct variables + fix ARM_NEON
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
- main -> examples
- utils -> examples (renamed to "common")
- quantize -> examples
- separate tools for "perplexity" and "embedding"
Hope I didn't break something !
|