aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md34
1 files changed, 3 insertions, 31 deletions
diff --git a/README.md b/README.md
index 3a6d757..65be1a6 100644
--- a/README.md
+++ b/README.md
@@ -145,44 +145,16 @@ python3 -m pip install torch numpy sentencepiece
python3 convert-pth-to-ggml.py models/7B/ 1
# quantize the model to 4-bits
-./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
+./quantize.sh 7B
# run the inference
./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
```
-For the bigger models, there are a few extra quantization steps. For example, for LLaMA-13B, converting to FP16 format
-will create 2 ggml files, instead of one:
-
-```bash
-ggml-model-f16.bin
-ggml-model-f16.bin.1
-```
-
-You need to quantize each of them separately like this:
-
-```bash
-./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2
-./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
-```
-
-Everything else is the same. Simply run:
-
-```bash
-./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 128
-```
-
-The number of files generated for each model is as follows:
-
-```
-7B -> 1 file
-13B -> 2 files
-30B -> 4 files
-65B -> 8 files
-```
-
When running the larger models, make sure you have enough disk space to store all the intermediate files.
+TODO: add model disk/mem requirements
+
### Interactive mode
If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter.