aboutsummaryrefslogtreecommitdiff
path: root/ggml-cuda.h
diff options
context:
space:
mode:
authorKawrakow <48489457+ikawrakow@users.noreply.github.com>2023-07-21 17:27:51 +0300
committerGitHub <noreply@github.com>2023-07-21 17:27:51 +0300
commitd924522a46c5ef097af4a88087d91673e8e87e4d (patch)
treea78782f11a57de0633bed5e505666bef50a80901 /ggml-cuda.h
parent4d76a5f49b9b5382dba5d13d92edb9159536c225 (diff)
Custom RoPE + bettter memory management for CUDA (#2295)
* Custom RoPE + bettter memory management for CUDA * Adjusted look ahead in ggml_cuda_pool_malloc to 5% This is sufficient it seems. We end up using about 200 MB less VRAM that way when running the 13B model with context 8192. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml-cuda.h')
0 files changed, 0 insertions, 0 deletions