llama.cpp.git - llama.cpp

Age	Commit message (Expand)	Author
2023-04-10	Print model version.	comex
2023-04-10	Rewrite loading code to try to satisfy everyone:	comex
2023-04-08	Add quantize-stats command for testing quantization (#728)	unbounded
2023-04-07	llama : always sort logits before nucleus sampling (#812)	Ivan Stepanov
2023-04-05	ggml, llama : avoid heavy V transpose + improvements (#775)	Georgi Gerganov
2023-04-05	llama : define non-positive top_k; top_k range check (#779)	Ivan Stepanov
2023-04-03	Define non-positive temperature behavior (#720)	Ivan Stepanov
2023-04-02	Added api for getting/setting the kv_cache (#685)	Christian Falch
2023-04-02	ggml : change ne to int64_t (#626)	Marian Cepok
2023-04-02	llama : do not allocate KV cache for "vocab_only == true" (#682)	Stephan Walter
2023-03-30	Introduce GGML migration tool for new file format	Justine Tunney
2023-03-30	Ensure --mlock works properly with mmap() support	Justine Tunney
2023-03-30	Make loading weights 10-100x faster	Justine Tunney
2023-03-30	Initial windows support (untested)	Slaren
2023-03-30	Always initialize mm_addr and mm_length in llama_model	Slaren
2023-03-30	Unmap the file in llama_free	Slaren
2023-03-30	Make mmap_file static	Slaren
2023-03-30	Fix ggml_init_params in quantize	Slaren
2023-03-30	Add mmap support for model files	Slaren
2023-03-29	llama : fix compile warnings when reading the vocab	Georgi Gerganov
2023-03-29	llama : use the same threshold for OpenBLAS and ggml thread limiting (#577)	Maël Kerbiriou
2023-03-28	py : add temporary script to convert old ggml files to newer version (#539)	thement
2023-03-28	all : be more strict about converting float to double (#458)	Stephan Walter
2023-03-28	ggml : introduce structs for the q4 data blocks (#356)	Stephan Walter
2023-03-25	Cleanup STL headers + fix embedding examples + minor stuff	Georgi Gerganov
2023-03-25	Don't interefe with BLAS for large prompts by running only 1 thread	Georgi Gerganov
2023-03-25	Add timings for the prompt evaluation (#478)	slaren
2023-03-25	Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLAS	Georgi Gerganov
2023-03-25	Add support for file load progress reporting callbacks (#434)	Jed Fox
2023-03-25	Fix crash for 65B model with pre-allocated memory (#485)	Chris Kuehl
2023-03-24	Reduce memory usage and allocate enough memory for largest context (#473)	Georgi Gerganov
2023-03-24	Temporary bump the memory buffer size - hopefully fix issues from 483bab2e	Georgi Gerganov
2023-03-24	Properly free llama_context on failure	Georgi Gerganov
2023-03-24	Support calling mlock() on loaded model data on Linux and macOS (#453)	comex
2023-03-24	Add embedding mode with arg flag. Currently working (#282)	Luciano
2023-03-24	Revert "Fix memory allocation issues and seg faults"	Georgi Gerganov
2023-03-24	Fix memory allocation issues and seg faults	Georgi Gerganov
2023-03-23	Avoid the transposed X branch in the Z = X * Y matrix multiplication (#439)	Georgi Gerganov
2023-03-22	Add missing header for memcpy (#386)	Yusuf Kağan Hanoğlu
2023-03-22	Init llama_context_params properly from CLI (#370)	Georgi Gerganov
2023-03-22	Introduce C-style API (#370)	Georgi Gerganov