Age | Commit message (Expand) | Author |
2023-06-12 | Leverage mmap for offloading tensors to GPU (#1597) | Howard Su |
2023-06-10 | llama : support requantizing models instead of only allowing quantization fro... | Kerfuffle |
2023-06-09 | OpenCL: Add release memory (#1741) | Robert Sung-wook Shin |
2023-06-06 | llama : fix vram_scratch var | Georgi Gerganov |
2023-06-06 | llama : fix compile warnings | Georgi Gerganov |
2023-06-06 | Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) | Johannes Gäßler |
2023-06-06 | metal : add f16 support | Georgi Gerganov |
2023-06-06 | llama : temporary disable Q6_K output quantization (#1711) | Georgi Gerganov |
2023-06-06 | metal : add checks for buffer size (#1706) | Spencer Sutton |
2023-06-05 | llama : consistently catch and throw only exceptions deriving from std::excep... | mgroeber9110 |
2023-06-05 | metal : use shared buffers between CPU and GPU (#1696) | kiltyj |
2023-06-05 | ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) | Kawrakow |
2023-06-05 | Increase 3B scratch buffers. (#1698) | Henri Vasserman |
2023-06-05 | llama : fix Metal KV cache sync (close #1695) | Georgi Gerganov |
2023-06-04 | llama : Metal inference (#1642) | Georgi Gerganov |
2023-06-04 | OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653) | 0cc4m |
2023-05-30 | OpenLLaMA 3B support (#1588) | Henri Vasserman |
2023-05-23 | OpenCL Token Generation Acceleration (#1459) | 0cc4m |
2023-05-20 | llama : define magic numbers as integer constants (#1518) (#1520) | Juuso Alasuutari |
2023-05-20 | cuda : loading models directly into VRAM, norm calculation on GPU, broadcasti... | Johannes Gäßler |
2023-05-20 | llama : add llama_init_backend() API (close #1527) | Georgi Gerganov |
2023-05-20 | llama : fix name shadowing and C4146 (#1526) | Maxime |
2023-05-20 | llama : fix compile warnings in llama_set_state_data() | Georgi Gerganov |
2023-05-19 | ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508) | Georgi Gerganov |
2023-05-19 | minor : fix compile warnings | Georgi Gerganov |
2023-05-18 | make kv_f16 the default for api users (#1517) | Erik Scholz |
2023-05-17 | Remove unused n_parts parameter (#1509) | Stephan Walter |
2023-05-13 | llama : fix unused warning | Georgi Gerganov |
2023-05-13 | ggml : GPU-accelerated token generation (#1412) | Johannes Gäßler |
2023-05-13 | ggml : implement backward pass for llama + small training-llama-from-scratch ... | xaedes |
2023-05-13 | llama : fix various warnings | Georgi Gerganov |
2023-05-13 | llama : free ggml context in set / copy state data (close #1425) | Georgi Gerganov |
2023-05-12 | ggml : remove bit shuffling (#1405) | Georgi Gerganov |
2023-05-08 | llama : fix hparams shadow (#1367) | Pavol Rusnak |
2023-05-08 | llama : require first token to be BOS (#1303) | Georgi Gerganov |
2023-05-06 | Remove default arguments from sampling functions (#1343) | Jed Fox |
2023-05-02 | llama : only copy used KV cache in get / set state (#1272) | Evan Jones |
2023-05-02 | llama : fix compile warnings | Georgi Gerganov |
2023-05-02 | llama : allow 0 as a seed number. (#1275) | Robert Brisita |
2023-05-02 | ggml: add names to tensors (#1268) | slaren |
2023-05-01 | llama : fix session load / save (#1263) | Georgi Gerganov |
2023-05-01 | cuBLAS: fall back to pageable memory if pinned alloc fails (#1233) | slaren |
2023-05-01 | llama : let context be const when accessing const data (#1261) | Alex Klinkhamer |
2023-04-29 | ggml : adjust mul_mat_f16 work memory (#1226) | Georgi Gerganov |
2023-04-29 | examples : fix save-load-state + rename llama-util.h | Georgi Gerganov |
2023-04-29 | llama : new sampling algorithms (#1126) | Ivan Stepanov |
2023-04-29 | cuBLAS: use host pinned memory and dequantize while copying (#1207) | slaren |
2023-04-28 | Remove Q4_3 which is no better than Q5 (#1218) | Stephan Walter |
2023-04-28 | llama : add session file format and saved sessions in main (#1169) | Evan Jones |
2023-04-28 | ggml : add CLBlast support (#1164) | 0cc4m |