aboutsummaryrefslogtreecommitdiff
path: root/llama.h
diff options
context:
space:
mode:
authorunbounded <haakon@likedan.net>2023-04-25 19:20:46 +0200
committerGitHub <noreply@github.com>2023-04-25 20:20:46 +0300
commitdd0eabc049fb1efc631cab8eb0a646808d704e18 (patch)
tree23a35354481ec346c4501937b95612a19fff9d21 /llama.h
parent54bb60e26858be251a0eb3cb70f80322aff804a0 (diff)
ggml : use full range for Q4_0 and Q4_2 quantization (#729)
* Use full range for q4_0 quantization By keeping the sign of the highest magnitude, we can make sure the highest value maps to -8, which is currently unused. This is a bit of a freebie since it is fully backwards compatible with the current format. * Update quantize_row_q4_0 for AVX/AVX2 * Update quantize_row_q4_0 for WASM Untested * Update quantize_row_q4_0 for Arm NEON * Update quantize_row_q4_0 for PowerPC Untested * Use full range for q4_2 quantization
Diffstat (limited to 'llama.h')
0 files changed, 0 insertions, 0 deletions