metal : add Q4_K implementation (#1733)

* Metal implementation for Q4_K Very slow for now: 42 ms / token, Q4_0 runs in 28 ms/token on my 30-core M2 Max GPU. * Optimizing Q4_K on metal The first token always takes longer, I guess because the metal kernel is being jit-compiled. So, using n = 128 to measure time. At this point Q4_K takes 29.5 ms / token compared to 27.2 ms / token for Q4_0. Quite a bit better than the initial attempt, but still not good enough. * Optimizing q4_K metal dot some more For n = 256 it is now 28.1 ms/token compared to 27 ms/token for q4_0. * Fix after merge with master --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
author: Kawrakow <48489457+ikawrakow@users.noreply.github.com> 2023-06-08 10:08:23 +0300
committer: GitHub <noreply@github.com> 2023-06-08 10:08:23 +0300
commit: 4161bdc04debb70bf5f275492b4d89fd9330087c (patch)
tree: 9b0c6325e720b101d67ec2415bc0d69e4fd89379 /.clang-tidy
parent: 0035858273ebe0694926bf4414d279f3e1cd109d (diff)
1 files changed, 0 insertions, 18 deletions
diff --git a/.clang-tidy b/.clang-tidy
deleted file mode 100644
index 1a42b9a..0000000
--- a/.clang-tidy
+++ /dev/null
@@ -1,18 +0,0 @@
----
-Checks: >
-    bugprone-*,
-    -bugprone-easily-swappable-parameters,
-    -bugprone-implicit-widening-of-multiplication-result,
-    -bugprone-narrowing-conversions,
-    readability-*,
-    -readability-avoid-unconditional-preprocessor-if,
-    -readability-function-cognitive-complexity,
-    -readability-identifier-length,
-    -readability-implicit-bool-conversion,
-    -readability-magic-numbers,
-    -readability-uppercase-literal-suffix,
-    clang-analyzer-*,
-    -clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling,
-    performance-*,
-    portability-*,
-FormatStyle: none
author	Kawrakow <48489457+ikawrakow@users.noreply.github.com>	2023-06-08 10:08:23 +0300
committer	GitHub <noreply@github.com>	2023-06-08 10:08:23 +0300
commit	4161bdc04debb70bf5f275492b4d89fd9330087c (patch)
tree	9b0c6325e720b101d67ec2415bc0d69e4fd89379 /.clang-tidy
parent	0035858273ebe0694926bf4414d279f3e1cd109d (diff)