From 32c54116318929c90fd7ae814cf9b5232cd44c36 Mon Sep 17 00:00:00 2001 From: Howard Su Date: Thu, 13 Jul 2023 21:58:25 +0800 Subject: Revert "Support using mmap when applying LoRA (#2095)" (#2206) Has perf regression when mlock is used. This reverts commit 2347463201a9f4159ae95b737e1544dd300569c8. --- examples/main/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'examples/main') diff --git a/examples/main/README.md b/examples/main/README.md index 04b8d54..3753861 100644 --- a/examples/main/README.md +++ b/examples/main/README.md @@ -293,5 +293,5 @@ These options provide extra functionality and customization when running the LLa - `-mg i, --main-gpu i`: When using multiple GPUs this option controls which GPU is used for small tensors for which the overhead of splitting the computation across all GPUs is not worthwhile. The GPU in question will use slightly more VRAM to store a scratch buffer for temporary results. By default GPU 0 is used. Requires cuBLAS. - `-ts SPLIT, --tensor-split SPLIT`: When using multiple GPUs this option controls how large tensors should be split across all GPUs. `SPLIT` is a comma-separated list of non-negative values that assigns the proportion of data that each GPU should get in order. For example, "3,2" will assign 60% of the data to GPU 0 and 40% to GPU 1. By default the data is split in proportion to VRAM but this may not be optimal for performance. Requires cuBLAS. - `-lv, --low-vram`: Do not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires cuBLAS. -- `--lora FNAME`: Apply a LoRA (Low-Rank Adaptation) adapter to the model. This allows you to adapt the pretrained model to specific tasks or domains. +- `--lora FNAME`: Apply a LoRA (Low-Rank Adaptation) adapter to the model (implies --no-mmap). This allows you to adapt the pretrained model to specific tasks or domains. - `--lora-base FNAME`: Optional model to use as a base for the layers modified by the LoRA adapter. This flag is used in conjunction with the `--lora` flag, and specifies the base model for the adaptation. -- cgit v1.2.3