llama.cpp.git - llama.cpp

diff options

author	Johannes Gäßler <johannesg@5d6.de>	2023-07-29 23:04:44 +0200
committer	GitHub <noreply@github.com>	2023-07-29 23:04:44 +0200
commit	11f3ca06b8c66b0427aab0a472479da22553b472 (patch)
tree	8e934ff0d93a78447d996b00561f7ff826c3533f /media/llama0-logo.png
parent	9baf9ef304f330009d5a93b7390280a0fd27c9a1 (diff)

CUDA: Quantized matrix matrix multiplication (#2160)

* mmq implementation for non k-quants * q6_K * q2_K * q3_k * q4_K * vdr * q5_K * faster q8_1 loading * loop unrolling * add __restrict__ * q2_K sc_high * GGML_CUDA_MMQ_Y * Updated Makefile * Update Makefile * DMMV_F16 -> F16 * Updated README, CMakeLists * Fix CMakeLists.txt * Fix CMakeLists.txt * Fix multi GPU out-of-bounds

Diffstat (limited to 'media/llama0-logo.png')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: