Age | Commit message (Expand) | Author |
2023-06-18 | Fixed incorrectly applying RMS norm twice (#1925) | Johannes Gäßler |
2023-06-18 | llama : prevent usage of k-quants when tensor size is not a multiple of 256 (... | Kawrakow |
2023-06-18 | metal : handle buffers larger than device's maxBufferLength (#1826) | Georgi Gerganov |
2023-06-17 | llama : fix kv_cache `n` init (close #1903) | Georgi Gerganov |
2023-06-17 | ggml : fix warnings under MSVC (#1908) | Howard Su |
2023-06-16 | llama : fix embd when offloading non-repeating layers (#1891) | Johannes Gäßler |
2023-06-16 | build : fix and ignore MSVC warnings (#1889) | Borislav Stanimirov |
2023-06-14 | CUDA full GPU acceleration, KV cache in VRAM (#1827) | Johannes Gäßler |
2023-06-13 | train : improved training-from-scratch example (#1652) | xaedes |
2023-06-13 | Allow "quantizing" to f16 and f32 (#1787) | Kerfuffle |
2023-06-12 | Metal implementation for all k_quants (#1807) | Kawrakow |
2023-06-12 | Leverage mmap for offloading tensors to GPU (#1597) | Howard Su |
2023-06-10 | llama : support requantizing models instead of only allowing quantization fro... | Kerfuffle |
2023-06-09 | OpenCL: Add release memory (#1741) | Robert Sung-wook Shin |
2023-06-06 | llama : fix vram_scratch var | Georgi Gerganov |
2023-06-06 | llama : fix compile warnings | Georgi Gerganov |
2023-06-06 | Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) | Johannes Gäßler |
2023-06-06 | metal : add f16 support | Georgi Gerganov |
2023-06-06 | llama : temporary disable Q6_K output quantization (#1711) | Georgi Gerganov |
2023-06-06 | metal : add checks for buffer size (#1706) | Spencer Sutton |
2023-06-05 | llama : consistently catch and throw only exceptions deriving from std::excep... | mgroeber9110 |
2023-06-05 | metal : use shared buffers between CPU and GPU (#1696) | kiltyj |
2023-06-05 | ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) | Kawrakow |
2023-06-05 | Increase 3B scratch buffers. (#1698) | Henri Vasserman |
2023-06-05 | llama : fix Metal KV cache sync (close #1695) | Georgi Gerganov |
2023-06-04 | llama : Metal inference (#1642) | Georgi Gerganov |
2023-06-04 | OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653) | 0cc4m |
2023-05-30 | OpenLLaMA 3B support (#1588) | Henri Vasserman |
2023-05-23 | OpenCL Token Generation Acceleration (#1459) | 0cc4m |
2023-05-20 | llama : define magic numbers as integer constants (#1518) (#1520) | Juuso Alasuutari |
2023-05-20 | cuda : loading models directly into VRAM, norm calculation on GPU, broadcasti... | Johannes Gäßler |
2023-05-20 | llama : add llama_init_backend() API (close #1527) | Georgi Gerganov |
2023-05-20 | llama : fix name shadowing and C4146 (#1526) | Maxime |
2023-05-20 | llama : fix compile warnings in llama_set_state_data() | Georgi Gerganov |
2023-05-19 | ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508) | Georgi Gerganov |
2023-05-19 | minor : fix compile warnings | Georgi Gerganov |
2023-05-18 | make kv_f16 the default for api users (#1517) | Erik Scholz |
2023-05-17 | Remove unused n_parts parameter (#1509) | Stephan Walter |
2023-05-13 | llama : fix unused warning | Georgi Gerganov |
2023-05-13 | ggml : GPU-accelerated token generation (#1412) | Johannes Gäßler |
2023-05-13 | ggml : implement backward pass for llama + small training-llama-from-scratch ... | xaedes |
2023-05-13 | llama : fix various warnings | Georgi Gerganov |
2023-05-13 | llama : free ggml context in set / copy state data (close #1425) | Georgi Gerganov |
2023-05-12 | ggml : remove bit shuffling (#1405) | Georgi Gerganov |
2023-05-08 | llama : fix hparams shadow (#1367) | Pavol Rusnak |
2023-05-08 | llama : require first token to be BOS (#1303) | Georgi Gerganov |
2023-05-06 | Remove default arguments from sampling functions (#1343) | Jed Fox |
2023-05-02 | llama : only copy used KV cache in get / set state (#1272) | Evan Jones |
2023-05-02 | llama : fix compile warnings | Georgi Gerganov |
2023-05-02 | llama : allow 0 as a seed number. (#1275) | Robert Brisita |