Age | Commit message (Expand) | Author |
2023-06-06 | Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) | Johannes Gäßler |
2023-06-05 | ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) | Kawrakow |
2023-06-04 | llama : Metal inference (#1642) | Georgi Gerganov |
2023-05-28 | Only show -ngl option when relevant + other doc/arg handling updates (#1625) | Kerfuffle |
2023-05-20 | llama : define magic numbers as integer constants (#1518) (#1520) | Juuso Alasuutari |
2023-05-20 | llama : add llama_init_backend() API (close #1527) | Georgi Gerganov |
2023-05-20 | llama : fix compile warnings in llama_set_state_data() | Georgi Gerganov |
2023-05-19 | ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508) | Georgi Gerganov |
2023-05-17 | Remove unused n_parts parameter (#1509) | Stephan Walter |
2023-05-13 | ggml : GPU-accelerated token generation (#1412) | Johannes Gäßler |
2023-05-13 | llama : free ggml context in set / copy state data (close #1425) | Georgi Gerganov |
2023-05-12 | ggml : remove bit shuffling (#1405) | Georgi Gerganov |
2023-05-06 | Remove default arguments from sampling functions (#1343) | Jed Fox |
2023-05-02 | llama : only copy used KV cache in get / set state (#1272) | Evan Jones |
2023-05-02 | llama : fix compile warnings | Georgi Gerganov |
2023-05-02 | llama : allow 0 as a seed number. (#1275) | Robert Brisita |
2023-05-01 | llama : fix session load / save (#1263) | Georgi Gerganov |
2023-05-01 | llama : let context be const when accessing const data (#1261) | Alex Klinkhamer |
2023-04-29 | llama : new sampling algorithms (#1126) | Ivan Stepanov |
2023-04-28 | Remove Q4_3 which is no better than Q5 (#1218) | Stephan Walter |
2023-04-28 | llama : add session file format and saved sessions in main (#1169) | Evan Jones |
2023-04-26 | ggml : add Q5_0 and Q5_1 quantization (#1187) | Georgi Gerganov |
2023-04-26 | Allow setting the rng seed after initialization. (#1184) | Ásgeir Bjarni Ingvarsson |
2023-04-25 | ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (... | Georgi Gerganov |
2023-04-24 | llama : refactor get / set state + remove redundant kv cache API (#1143) | Georgi Gerganov |
2023-04-22 | llama : add api for getting/setting the complete state: rng, logits, embeddin... | xaedes |
2023-04-20 | llama : multi-threaded quantization (#1075) | Kawrakow |
2023-04-20 | ggml : add Q4_3 quantization (#1082) | Georgi Gerganov |
2023-04-18 | ggml : add new Q4_2 quantization (ARM only) (#1046) | Georgi Gerganov |
2023-04-17 | Add LoRA support (#820) | slaren |
2023-04-13 | llama : merge llama_internal.h into llama.h | Georgi Gerganov |
2023-04-12 | Don't crash on ftype (formerly f16) == 4 (#917) | Stephan Walter |
2023-04-11 | Add enum llama_ftype, sync ggml_type to model files (#709) | Stephan Walter |
2023-04-10 | Rewrite loading code to try to satisfy everyone: | comex |
2023-04-08 | Add quantize-stats command for testing quantization (#728) | unbounded |
2023-04-02 | Added api for getting/setting the kv_cache (#685) | Christian Falch |
2023-03-30 | Make loading weights 10-100x faster | Justine Tunney |
2023-03-29 | Fix typo in llama.h (#593) | anzz1 |
2023-03-28 | llama : fix linkage with mingw (#551) | anzz1 |
2023-03-28 | all : be more strict about converting float to double (#458) | Stephan Walter |
2023-03-28 | ggml : introduce structs for the q4 data blocks (#356) | Stephan Walter |
2023-03-25 | Cleanup STL headers + fix embedding examples + minor stuff | Georgi Gerganov |
2023-03-25 | Add support for file load progress reporting callbacks (#434) | Jed Fox |
2023-03-25 | Add missing struct annotation (#483) | Doomsdayrs |
2023-03-24 | Support calling mlock() on loaded model data on Linux and macOS (#453) | comex |
2023-03-24 | Add embedding mode with arg flag. Currently working (#282) | Luciano |
2023-03-22 | Introduce C-style API (#370) | Georgi Gerganov |