diff options
author | klosax <131523366+klosax@users.noreply.github.com> | 2023-08-07 19:07:19 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-08-07 19:07:19 +0200 |
commit | f3c3b4b1672d860800639c87d3b5d17564692469 (patch) | |
tree | ebdc8a40a9e374eb4713da9c6233c8c499cd768a /examples/main/README.md | |
parent | 93356bdb7a324a8f6570f99d02af392cd4c45796 (diff) |
Add --rope-scale parameter (#2544)
* common.cpp : Add --rope-scale parameter
* README.md : Add info about using linear rope scaling
Diffstat (limited to 'examples/main/README.md')
-rw-r--r-- | examples/main/README.md | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/examples/main/README.md b/examples/main/README.md index 014112e..55c1609 100644 --- a/examples/main/README.md +++ b/examples/main/README.md @@ -140,6 +140,12 @@ The `--ctx-size` option allows you to set the size of the prompt context used by - `-c N, --ctx-size N`: Set the size of the prompt context (default: 512). The LLaMA models were built with a context of 2048, which will yield the best results on longer input/inference. However, increasing the context size beyond 2048 may lead to unpredictable results. +### Extended Context Size + +Some fine-tuned models have extened the context length by scaling RoPE. For example, if the original pretrained model have a context length (max sequence length) of 4096 (4k) and the fine-tuned model have 32k. That is a scaling factor of 8, and should work by setting the above `--ctx-size` to 32768 (32k) and `--rope-scale` to 8. + +- `--rope-scale N`: Where N is the linear scaling factor used by the fine-tuned model. + ### Keep Prompt The `--keep` option allows users to retain the original prompt when the model runs out of context, ensuring a connection to the initial instruction or conversation topic is maintained. |