aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRahul Vivek Nair <68507071+RahulVivekNair@users.noreply.github.com>2023-06-22 03:18:43 +0530
committerGitHub <noreply@github.com>2023-06-21 23:48:43 +0200
commitfb98254f99d769fcbbf20966ef386abdb48ef601 (patch)
tree056e0d652ec7673c692e86888835222473f39ff8
parent049aa16b8c5c6d086246e4e6b9feb18de4fbd663 (diff)
Fix typo in README.md (#1961)
-rw-r--r--README.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/README.md b/README.md
index 67012ad..ace5886 100644
--- a/README.md
+++ b/README.md
@@ -340,7 +340,7 @@ Building the program with BLAS support may lead to some performance improvements
| LLAMA_CUDA_DMMV_X | Positive integer >= 32 | 32 | Number of values in x direction processed by the CUDA dequantization + matrix vector multiplication kernel per iteration. Increasing this value can improve performance on fast GPUs. Power of 2 heavily recommended. Does not affect k-quants. |
| LLAMA_CUDA_DMMV_Y | Positive integer | 1 | Block size in y direction for the CUDA dequantization + mul mat vec kernels. Increasing this value can improve performance on fast GPUs. Power of 2 recommended. Does not affect k-quants. |
| LLAMA_CUDA_DMMV_F16 | Boolean | false | If enabled, use half-precision floating point arithmetic for the CUDA dequantization + mul mat vec kernels. Can improve performance on relatively recent GPUs. |
- | LLAMA_CUDA_KQUANTS_ITER | 1 or 2 | 2 | Number of values processed per iteration and per CUDA thread for Q2_K and Q6_K quantization formats. Setting this value 2 1 can improve performance for slow GPUs. |
+ | LLAMA_CUDA_KQUANTS_ITER | 1 or 2 | 2 | Number of values processed per iteration and per CUDA thread for Q2_K and Q6_K quantization formats. Setting this value to 1 can improve performance for slow GPUs. |
- #### CLBlast