CUDA: tuned mul_mat_q kernels (#2546)

author: Johannes Gäßler <johannesg@5d6.de> 2023-08-09 09:42:34 +0200
committer: GitHub <noreply@github.com> 2023-08-09 09:42:34 +0200
commit: 25d43e0eb578b6e73046d9d6644a3a14d460600d (patch)
tree: fddb8e9a044ce7eda09024e345a871cdada4cac8 /README.md
parent: f5bfea0580e417f99850d5456ca541d871a3e48c (diff)
1 files changed, 0 insertions, 1 deletions
diff --git a/README.md b/README.md
index 2ece294..6900b11 100644
--- a/README.md
+++ b/README.md
@@ -406,7 +406,6 @@ Building the program with BLAS support may lead to some performance improvements
 --->
   | Option                  | Legal values           | Default | Description |
   |-------------------------|------------------------|---------|-------------|
-  | LLAMA_CUDA_MMQ_Y        | Positive integer >= 32 |      64 | Tile size in y direction when using the custom CUDA kernels for prompt processing. Higher values can be faster depending on the amount of shared memory available. Power of 2 heavily recommended. |
   | LLAMA_CUDA_FORCE_DMMV   | Boolean                |   false | Force the use of dequantization + matrix vector multiplication kernels instead of using kernels that do matrix vector multiplication on quantized data. By default the decision is made based on compute capability (MMVQ for 6.1/Pascal/GTX 1000 or higher). Does not affect k-quants. |
   | LLAMA_CUDA_DMMV_X       | Positive integer >= 32 |      32 | Number of values in x direction processed by the CUDA dequantization + matrix vector multiplication kernel per iteration. Increasing this value can improve performance on fast GPUs. Power of 2 heavily recommended. Does not affect k-quants. |
   | LLAMA_CUDA_MMV_Y        | Positive integer       |       1 | Block size in y direction for the CUDA mul mat vec kernels. Increasing this value can improve performance on fast GPUs. Power of 2 recommended. Does not affect k-quants. |
author	Johannes Gäßler <johannesg@5d6.de>	2023-08-09 09:42:34 +0200
committer	GitHub <noreply@github.com>	2023-08-09 09:42:34 +0200
commit	25d43e0eb578b6e73046d9d6644a3a14d460600d (patch)
tree	fddb8e9a044ce7eda09024e345a871cdada4cac8 /README.md
parent	f5bfea0580e417f99850d5456ca541d871a3e48c (diff)