diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2023-07-24 00:19:47 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-07-24 00:19:47 +0300 |
commit | 2f9cf974a066ac0e03fbb235d834b01b0164d743 (patch) | |
tree | 1c0c1b42ef5d1f8013d9641d778225e98b59d134 /media/llama0-logo.png | |
parent | 4f06592cc6b83979e4b442e8cb97b3948c857188 (diff) |
Some more Q4_K and Q5_K speedup on CUDA (#2346)
* Faster Q5_K on CUDA
* Small Q5_K improvement on older GPUs
* Spped up Q4_K on CUDA
GTX1660: 29.5 ms/t -> 25.6 ms/t
RTX4080: 8.40 ms/t -> 8.25 ms/t
* Spped up Q4_K on CUDA
GTX1660: 36.7 ms/t -> 35.6 ms/t
RTX4080: 9.8 ms/t -> 9.5 ms/t
* Address PR comments
* Add some comments to satisfy PR reviewer
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'media/llama0-logo.png')
0 files changed, 0 insertions, 0 deletions