llama.cpp.git - llama.cpp

diff options

author	Kawrakow <48489457+ikawrakow@users.noreply.github.com>	2023-07-21 17:27:51 +0300
committer	GitHub <noreply@github.com>	2023-07-21 17:27:51 +0300
commit	d924522a46c5ef097af4a88087d91673e8e87e4d (patch)
tree	a78782f11a57de0633bed5e505666bef50a80901 /convert-lora-to-ggml.py
parent	4d76a5f49b9b5382dba5d13d92edb9159536c225 (diff)

Custom RoPE + bettter memory management for CUDA (#2295)

* Custom RoPE + bettter memory management for CUDA * Adjusted look ahead in ggml_cuda_pool_malloc to 5% This is sufficient it seems. We end up using about 200 MB less VRAM that way when running the 13B model with context 8192. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Diffstat (limited to 'convert-lora-to-ggml.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: