aboutsummaryrefslogtreecommitdiff
path: root/examples
AgeCommit message (Expand)Author
2023-06-15Better error when using both LoRA + GPU layers (#1861)Johannes Gäßler
2023-06-14CUDA full GPU acceleration, KV cache in VRAM (#1827)Johannes Gäßler
2023-06-13baby-llama : fix operator!= (#1821)0xspringtime
2023-06-13train : improved training-from-scratch example (#1652)xaedes
2023-06-13llama : do a warm-up eval at start for better timings (#1824)Georgi Gerganov
2023-06-13Allow "quantizing" to f16 and f32 (#1787)Kerfuffle
2023-06-11Fix issue where interactive mode crashes when input exceeds ctx size (#1789)Kerfuffle
2023-06-10llama : support requantizing models instead of only allowing quantization fro...Kerfuffle
2023-06-06main: add the possibility to open the prompt cache read-only (#1640)Willy Tarreau
2023-06-06Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703)Johannes Gäßler
2023-06-05ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684)Kawrakow
2023-06-04llama : Metal inference (#1642)Georgi Gerganov
2023-06-03Fix prompt cache saving and chat-persistent rollover (#1678)Evan Jones
2023-05-29Work around for recalculating logits in cached prompts (Fixes #1585) (#1609)DannyDaemonic
2023-05-28Only show -ngl option when relevant + other doc/arg handling updates (#1625)Kerfuffle
2023-05-28examples : add --alias option to gpt_params to set use friendly model name (#...Vladimir Zorin
2023-05-27Include server in releases + other build system cleanups (#1610)Kerfuffle
2023-05-25Some improvements to loading the session with --prompt-cache (#1550)Kerfuffle
2023-05-24chat-persistent.sh : use bracket expressions in grep (#1564)Senemu
2023-05-21examples : add server example with REST API (#1443)Steward Garcia
2023-05-20llama : add llama_init_backend() API (close #1527)Georgi Gerganov
2023-05-20Fix for mingw (#1462)DannyDaemonic
2023-05-19examples : add persistent chat (#1495)Evan Jones
2023-05-19main : make reverse prompt option act as a stop token in non-interactive mode...Jason McCartney
2023-05-19minor : fix compile warningsGeorgi Gerganov
2023-05-18Fixes #1511 lambda issue for w64devkit (mingw) (#1513)DannyDaemonic
2023-05-17Remove unused n_parts parameter (#1509)Stephan Walter
2023-05-17benchmark-matmul: Print the average of the test results (#1490)rankaiyx
2023-05-16define default model path once, sync path with readme (#1366)András Salamon
2023-05-15fix get_num_physical_cores() (#1436)zrm
2023-05-14benchmark-matmul: fix clang-tidy issues, report results in GFLOPS (#1458)slaren
2023-05-13ggml : GPU-accelerated token generation (#1412)Johannes Gäßler
2023-05-13ggml : implement backward pass for llama + small training-llama-from-scratch ...xaedes
2023-05-13embedding : remove unused code (#1426)Rinne
2023-05-12llama : fix --mtest option (close #1414)Georgi Gerganov
2023-05-12CLI args use - instead of _, backwards compatible (#1416)Johannes Gäßler
2023-05-12ggml : remove bit shuffling (#1405)Georgi Gerganov
2023-05-10main : add option to save full output to session (#1338)Evan Jones
2023-05-09Locale fix for Windows (#1379)DannyDaemonic
2023-05-08Interface improvements and `--multiline-input` (previously `--author-mode`) (...DannyDaemonic
2023-05-08llama : require first token to be BOS (#1303)Georgi Gerganov
2023-05-08Documented CUDA reproducibility, added warning (#1346)Johannes Gäßler
2023-05-06Remove default arguments from sampling functions (#1343)Jed Fox
2023-05-05quantize: make output filename optional, default to ggml-model-<ftype>.bin (#...slaren
2023-05-04main : add --in-suffix option (#1318)44670
2023-05-04Only escape prompts when used with `-e` (#1311)DannyDaemonic
2023-05-04Update main's README.md with new features (#1296)DannyDaemonic
2023-05-04fix #1224 reverse prompt and multi line (#1297)Tomas
2023-05-03examples : read chat prompts from a template file (#1196)khimaros
2023-05-03examples : various prompt and example fixes (#1298)CRD716