diff options
author | Georgi Gerganov <ggerganov@gmail.com> | 2023-04-05 22:07:33 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-04-05 22:07:33 +0300 |
commit | 986b6ce9f99503c51ec5afd8a10baa32359434c6 (patch) | |
tree | f4655b45b130b908729eb1407ca9e016c05f21a4 /flake.lock | |
parent | 34162989297fdfe3ab7305451ce55bc87e3f4c9c (diff) |
ggml, llama : avoid heavy V transpose + improvements (#775)
ggml :
- added ggml_view_3d()
- ggml_view_tensor() now inherits the stride too
- reimplement ggml_cpy() to account for dst stride
- no longer require tensor->data to be memory aligned
llama :
- compute RoPE on 32-bit tensors (should be more accurate)
- store RoPE-ed K in the KV cache
- store transposed V in the KV cache (significant speed-up)
- avoid unnecessary Q copy
Diffstat (limited to 'flake.lock')
0 files changed, 0 insertions, 0 deletions