diff options
author | Kawrakow <48489457+ikawrakow@users.noreply.github.com> | 2023-07-21 17:27:51 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-07-21 17:27:51 +0300 |
commit | d924522a46c5ef097af4a88087d91673e8e87e4d (patch) | |
tree | a78782f11a57de0633bed5e505666bef50a80901 /ggml-cuda.h | |
parent | 4d76a5f49b9b5382dba5d13d92edb9159536c225 (diff) |
Custom RoPE + bettter memory management for CUDA (#2295)
* Custom RoPE + bettter memory management for CUDA
* Adjusted look ahead in ggml_cuda_pool_malloc to 5%
This is sufficient it seems.
We end up using about 200 MB less VRAM that way when running
the 13B model with context 8192.
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Diffstat (limited to 'ggml-cuda.h')
0 files changed, 0 insertions, 0 deletions