llama.cpp.git - llama.cpp

Age	Commit message (Expand)	Author
2023-06-15	Better error when using both LoRA + GPU layers (#1861)	Johannes Gäßler
2023-06-14	CUDA full GPU acceleration, KV cache in VRAM (#1827)	Johannes Gäßler
2023-06-13	baby-llama : fix operator!= (#1821)	0xspringtime
2023-06-13	train : improved training-from-scratch example (#1652)	xaedes
2023-06-13	llama : do a warm-up eval at start for better timings (#1824)	Georgi Gerganov
2023-06-13	Allow "quantizing" to f16 and f32 (#1787)	Kerfuffle
2023-06-11	Fix issue where interactive mode crashes when input exceeds ctx size (#1789)	Kerfuffle
2023-06-10	llama : support requantizing models instead of only allowing quantization fro...	Kerfuffle
2023-06-06	main: add the possibility to open the prompt cache read-only (#1640)	Willy Tarreau
2023-06-06	Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703)	Johannes Gäßler
2023-06-05	ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684)	Kawrakow
2023-06-04	llama : Metal inference (#1642)	Georgi Gerganov
2023-06-03	Fix prompt cache saving and chat-persistent rollover (#1678)	Evan Jones
2023-05-29	Work around for recalculating logits in cached prompts (Fixes #1585) (#1609)	DannyDaemonic
2023-05-28	Only show -ngl option when relevant + other doc/arg handling updates (#1625)	Kerfuffle
2023-05-28	examples : add --alias option to gpt_params to set use friendly model name (#...	Vladimir Zorin
2023-05-27	Include server in releases + other build system cleanups (#1610)	Kerfuffle
2023-05-25	Some improvements to loading the session with --prompt-cache (#1550)	Kerfuffle
2023-05-24	chat-persistent.sh : use bracket expressions in grep (#1564)	Senemu
2023-05-21	examples : add server example with REST API (#1443)	Steward Garcia
2023-05-20	llama : add llama_init_backend() API (close #1527)	Georgi Gerganov
2023-05-20	Fix for mingw (#1462)	DannyDaemonic
2023-05-19	examples : add persistent chat (#1495)	Evan Jones
2023-05-19	main : make reverse prompt option act as a stop token in non-interactive mode...	Jason McCartney
2023-05-19	minor : fix compile warnings	Georgi Gerganov
2023-05-18	Fixes #1511 lambda issue for w64devkit (mingw) (#1513)	DannyDaemonic
2023-05-17	Remove unused n_parts parameter (#1509)	Stephan Walter
2023-05-17	benchmark-matmul: Print the average of the test results (#1490)	rankaiyx
2023-05-16	define default model path once, sync path with readme (#1366)	András Salamon
2023-05-15	fix get_num_physical_cores() (#1436)	zrm
2023-05-14	benchmark-matmul: fix clang-tidy issues, report results in GFLOPS (#1458)	slaren
2023-05-13	ggml : GPU-accelerated token generation (#1412)	Johannes Gäßler
2023-05-13	ggml : implement backward pass for llama + small training-llama-from-scratch ...	xaedes
2023-05-13	embedding : remove unused code (#1426)	Rinne
2023-05-12	llama : fix --mtest option (close #1414)	Georgi Gerganov
2023-05-12	CLI args use - instead of _, backwards compatible (#1416)	Johannes Gäßler
2023-05-12	ggml : remove bit shuffling (#1405)	Georgi Gerganov
2023-05-10	main : add option to save full output to session (#1338)	Evan Jones
2023-05-09	Locale fix for Windows (#1379)	DannyDaemonic
2023-05-08	Interface improvements and `--multiline-input` (previously `--author-mode`) (...	DannyDaemonic
2023-05-08	llama : require first token to be BOS (#1303)	Georgi Gerganov
2023-05-08	Documented CUDA reproducibility, added warning (#1346)	Johannes Gäßler
2023-05-06	Remove default arguments from sampling functions (#1343)	Jed Fox
2023-05-05	quantize: make output filename optional, default to ggml-model-<ftype>.bin (#...	slaren
2023-05-04	main : add --in-suffix option (#1318)	44670
2023-05-04	Only escape prompts when used with `-e` (#1311)	DannyDaemonic
2023-05-04	Update main's README.md with new features (#1296)	DannyDaemonic
2023-05-04	fix #1224 reverse prompt and multi line (#1297)	Tomas
2023-05-03	examples : read chat prompts from a template file (#1196)	khimaros
2023-05-03	examples : various prompt and example fixes (#1298)	CRD716