Age | Commit message (Collapse) | Author |
|
* Generalize quantize_fns for simpler FP16 handling
* Remove call to ggml_cuda_mul_mat_get_wsize
* ci : disable FMA for mac os actions
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* fix test quantize perf
* avoid the global state
|
|
|
|
|
|
* Unit test for quantization functions
Use the ggml_internal_get_quantize_fn function to loop through all
quantization formats and run a sanity check on the result.
Also add a microbenchmark that times these functions directly without
running the rest of the GGML graph.
* test-quantize-fns: CI fixes
Fix issues uncovered in CI
- need to use sizes divisible by 32*8 for loop unrolling
- use intrinsic header that should work on Mac
* test-quantize: remove
Per PR comment, subsumed by test-quantize-fns
* test-quantize: fix for q8_0 intermediates
|