Age | Commit message (Expand) | Author |
2023-06-18 | examples : fix examples/metal (#1920) | Kawrakow |
2023-06-17 | minor : warning fixes | Georgi Gerganov |
2023-06-17 | Only one CUDA stream per device for async compute (#1898) | Johannes Gäßler |
2023-06-17 | llama : fix kv_cache `n` init (close #1903) | Georgi Gerganov |
2023-06-17 | Server Example Refactor and Improvements (#1570) | Randall Fitzgerald |
2023-06-17 | hooks : setting up flake8 and pre-commit hooks (#1681) | Jiří Podivín |
2023-06-17 | train : get raw text instead of page with html (#1905) | David Yang |
2023-06-16 | examples : add "simple" (#1840) | SuperUserNameMan |
2023-06-16 | Fixed possible macro redefinition (#1892) | FrankHB |
2023-06-16 | build : fix and ignore MSVC warnings (#1889) | Borislav Stanimirov |
2023-06-15 | examples : add chat-vicuna.sh (#1854) | yangli2 |
2023-06-15 | readme : server compile flag (#1874) | Srinivas Billa |
2023-06-15 | Better error when using both LoRA + GPU layers (#1861) | Johannes Gäßler |
2023-06-14 | CUDA full GPU acceleration, KV cache in VRAM (#1827) | Johannes Gäßler |
2023-06-13 | baby-llama : fix operator!= (#1821) | 0xspringtime |
2023-06-13 | train : improved training-from-scratch example (#1652) | xaedes |
2023-06-13 | llama : do a warm-up eval at start for better timings (#1824) | Georgi Gerganov |
2023-06-13 | Allow "quantizing" to f16 and f32 (#1787) | Kerfuffle |
2023-06-11 | Fix issue where interactive mode crashes when input exceeds ctx size (#1789) | Kerfuffle |
2023-06-10 | llama : support requantizing models instead of only allowing quantization fro... | Kerfuffle |
2023-06-06 | main: add the possibility to open the prompt cache read-only (#1640) | Willy Tarreau |
2023-06-06 | Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) | Johannes Gäßler |
2023-06-05 | ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) | Kawrakow |
2023-06-04 | llama : Metal inference (#1642) | Georgi Gerganov |
2023-06-03 | Fix prompt cache saving and chat-persistent rollover (#1678) | Evan Jones |
2023-05-29 | Work around for recalculating logits in cached prompts (Fixes #1585) (#1609) | DannyDaemonic |
2023-05-28 | Only show -ngl option when relevant + other doc/arg handling updates (#1625) | Kerfuffle |
2023-05-28 | examples : add --alias option to gpt_params to set use friendly model name (#... | Vladimir Zorin |
2023-05-27 | Include server in releases + other build system cleanups (#1610) | Kerfuffle |
2023-05-25 | Some improvements to loading the session with --prompt-cache (#1550) | Kerfuffle |
2023-05-24 | chat-persistent.sh : use bracket expressions in grep (#1564) | Senemu |
2023-05-21 | examples : add server example with REST API (#1443) | Steward Garcia |
2023-05-20 | llama : add llama_init_backend() API (close #1527) | Georgi Gerganov |
2023-05-20 | Fix for mingw (#1462) | DannyDaemonic |
2023-05-19 | examples : add persistent chat (#1495) | Evan Jones |
2023-05-19 | main : make reverse prompt option act as a stop token in non-interactive mode... | Jason McCartney |
2023-05-19 | minor : fix compile warnings | Georgi Gerganov |
2023-05-18 | Fixes #1511 lambda issue for w64devkit (mingw) (#1513) | DannyDaemonic |
2023-05-17 | Remove unused n_parts parameter (#1509) | Stephan Walter |
2023-05-17 | benchmark-matmul: Print the average of the test results (#1490) | rankaiyx |
2023-05-16 | define default model path once, sync path with readme (#1366) | András Salamon |
2023-05-15 | fix get_num_physical_cores() (#1436) | zrm |
2023-05-14 | benchmark-matmul: fix clang-tidy issues, report results in GFLOPS (#1458) | slaren |
2023-05-13 | ggml : GPU-accelerated token generation (#1412) | Johannes Gäßler |
2023-05-13 | ggml : implement backward pass for llama + small training-llama-from-scratch ... | xaedes |
2023-05-13 | embedding : remove unused code (#1426) | Rinne |
2023-05-12 | llama : fix --mtest option (close #1414) | Georgi Gerganov |
2023-05-12 | CLI args use - instead of _, backwards compatible (#1416) | Johannes Gäßler |
2023-05-12 | ggml : remove bit shuffling (#1405) | Georgi Gerganov |
2023-05-10 | main : add option to save full output to session (#1338) | Evan Jones |