aboutsummaryrefslogtreecommitdiff
path: root/convert-lora-to-ggml.py
diff options
context:
space:
mode:
authorKawrakow <48489457+ikawrakow@users.noreply.github.com>2023-07-21 17:27:51 +0300
committerGitHub <noreply@github.com>2023-07-21 17:27:51 +0300
commitd924522a46c5ef097af4a88087d91673e8e87e4d (patch)
treea78782f11a57de0633bed5e505666bef50a80901 /convert-lora-to-ggml.py
parent4d76a5f49b9b5382dba5d13d92edb9159536c225 (diff)
Custom RoPE + bettter memory management for CUDA (#2295)
* Custom RoPE + bettter memory management for CUDA * Adjusted look ahead in ggml_cuda_pool_malloc to 5% This is sufficient it seems. We end up using about 200 MB less VRAM that way when running the 13B model with context 8192. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'convert-lora-to-ggml.py')
0 files changed, 0 insertions, 0 deletions