diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2023-06-08 22:28:21 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-06-08 22:28:21 +0300 |
commit | 72ff5282bf0388c60821f504c4c8cc2b1f491aa6 (patch) | |
tree | 19d6971bdd6934b72a000694f2b1791dadd9f7dc /.ecrc | |
parent | 0bf7cf1b296fc9fca05411b37afdf08a531487d2 (diff) |
metal : add Q2_K implementation (#1762)
* metal : add Q2_K implementation
27.1 ms / token on M2 Max 30-core GPU, so about the
same speed as Q4_0. Memory throughput is ~156 GB/s.
The access pattern used in the Q2_K
CUDA implementation resulted in significantly lower
performance (~31 ms/token).
* Fixing merge conflicts
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to '.ecrc')
0 files changed, 0 insertions, 0 deletions