diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2023-06-08 10:08:23 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-06-08 10:08:23 +0300 |
commit | 4161bdc04debb70bf5f275492b4d89fd9330087c (patch) | |
tree | 9b0c6325e720b101d67ec2415bc0d69e4fd89379 /.clang-tidy | |
parent | 0035858273ebe0694926bf4414d279f3e1cd109d (diff) |
metal : add Q4_K implementation (#1733)
* Metal implementation for Q4_K
Very slow for now:
42 ms / token, Q4_0 runs in 28 ms/token on my
30-core M2 Max GPU.
* Optimizing Q4_K on metal
The first token always takes longer, I guess because
the metal kernel is being jit-compiled.
So, using n = 128 to measure time.
At this point Q4_K takes 29.5 ms / token
compared to 27.2 ms / token for Q4_0.
Quite a bit better than the initial attempt,
but still not good enough.
* Optimizing q4_K metal dot some more
For n = 256 it is now 28.1 ms/token compared to
27 ms/token for q4_0.
* Fix after merge with master
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to '.clang-tidy')
-rw-r--r-- | .clang-tidy | 18 |
1 files changed, 0 insertions, 18 deletions
diff --git a/.clang-tidy b/.clang-tidy deleted file mode 100644 index 1a42b9a..0000000 --- a/.clang-tidy +++ /dev/null @@ -1,18 +0,0 @@ ---- -Checks: > - bugprone-*, - -bugprone-easily-swappable-parameters, - -bugprone-implicit-widening-of-multiplication-result, - -bugprone-narrowing-conversions, - readability-*, - -readability-avoid-unconditional-preprocessor-if, - -readability-function-cognitive-complexity, - -readability-identifier-length, - -readability-implicit-bool-conversion, - -readability-magic-numbers, - -readability-uppercase-literal-suffix, - clang-analyzer-*, - -clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling, - performance-*, - portability-*, -FormatStyle: none |