Revert "Support using mmap when applying LoRA (#2095)" (#2206)

Has perf regression when mlock is used. This reverts commit 2347463201a9f4159ae95b737e1544dd300569c8.
author: Howard Su <howard0su@gmail.com> 2023-07-13 21:58:25 +0800
committer: GitHub <noreply@github.com> 2023-07-13 21:58:25 +0800
commit: 32c54116318929c90fd7ae814cf9b5232cd44c36 (patch)
tree: 3b9126e3fb387ef1aa53d7461f9a41e1ce2965ed /examples/main
parent: ff5d58faecf1f02b05bd015bdfc6a394cf2bc9ba (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/examples/main/README.md b/examples/main/README.md
index 04b8d54..3753861 100644
--- a/examples/main/README.md
+++ b/examples/main/README.md
@@ -293,5 +293,5 @@ These options provide extra functionality and customization when running the LLa
 -   `-mg i, --main-gpu i`: When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results. By default GPU 0 is used. Requires cuBLAS.
 -   `-ts SPLIT, --tensor-split SPLIT`: When using multiple GPUs this option controls how large tensors should be split across all GPUs. `SPLIT` is a comma-separated list of non-negative values that assigns the proportion of data that each GPU should get in order. For example, "3,2" will assign 60% of the data to GPU 0 and 40% to GPU 1. By default the data is split in proportion to VRAM but this may not be optimal for performance. Requires cuBLAS.
 -   `-lv, --low-vram`: Do not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires cuBLAS.
--   `--lora FNAME`: Apply a LoRA (Low-Rank Adaptation) adapter to the model. This allows you to adapt the pretrained model to specific tasks or domains.
+-   `--lora FNAME`: Apply a LoRA (Low-Rank Adaptation) adapter to the model (implies --no-mmap). This allows you to adapt the pretrained model to specific tasks or domains.
 -   `--lora-base FNAME`: Optional model to use as a base for the layers modified by the LoRA adapter. This flag is used in conjunction with the `--lora` flag, and specifies the base model for the adaptation.
author	Howard Su <howard0su@gmail.com>	2023-07-13 21:58:25 +0800
committer	GitHub <noreply@github.com>	2023-07-13 21:58:25 +0800
commit	32c54116318929c90fd7ae814cf9b5232cd44c36 (patch)
tree	3b9126e3fb387ef1aa53d7461f9a41e1ce2965ed /examples/main
parent	ff5d58faecf1f02b05bd015bdfc6a394cf2bc9ba (diff)