llama.cpp.git - llama.cpp

Age	Commit message (Expand)	Author
2023-04-25	ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (...	Georgi Gerganov
2023-04-24	llama : increase scratch buffer size for 65B (ref #1152)	Georgi Gerganov
2023-04-24	llama : refactor get / set state + remove redundant kv cache API (#1143)	Georgi Gerganov
2023-04-23	ggml : better PERF prints + support "LLAMA_PERF=1 make"	Georgi Gerganov
2023-04-22	Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122)	Stephan Walter
2023-04-22	ggml : fix AVX build + update to new Q8_0 format	Georgi Gerganov
2023-04-22	llama : add api for getting/setting the complete state: rng, logits, embeddin...	xaedes
2023-04-21	llama : remember and restore kv cache data pointers (#1104)	xaedes
2023-04-21	llama : fix comment for "output.weight" tensor	Georgi Gerganov
2023-04-20	ggml : sync ggml (add GPT-NeoX RoPE implementation)	Georgi Gerganov
2023-04-20	llama : multi-threaded quantization (#1075)	Kawrakow
2023-04-20	ggml : add Q4_3 quantization (#1082)	Georgi Gerganov
2023-04-19	Add NVIDIA cuBLAS support (#1044)	slaren
2023-04-18	ggml : add new Q4_2 quantization (ARM only) (#1046)	Georgi Gerganov
2023-04-17	Add LoRA support (#820)	slaren
2023-04-17	llama : well-defined static initialization of complex objects (#927)	Arik Poznanski
2023-04-17	Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933)	Ivan Komarov
2023-04-16	stdout : vertical align outputs for better readibility	Georgi Gerganov
2023-04-16	Fix msys2 build error and warnings (#1009)	nanahi
2023-04-14	Expose type name from ggml (#970)	Pavol Rusnak
2023-04-13	llama : merge llama_internal.h into llama.h	Georgi Gerganov
2023-04-12	Don't crash on ftype (formerly f16) == 4 (#917)	Stephan Walter
2023-04-11	Add enum llama_ftype, sync ggml_type to model files (#709)	Stephan Walter
2023-04-11	Windows fixes (#890)	comex
2023-04-10	Print model version.	comex
2023-04-10	Rewrite loading code to try to satisfy everyone:	comex
2023-04-08	Add quantize-stats command for testing quantization (#728)	unbounded
2023-04-07	llama : always sort logits before nucleus sampling (#812)	Ivan Stepanov
2023-04-05	ggml, llama : avoid heavy V transpose + improvements (#775)	Georgi Gerganov
2023-04-05	llama : define non-positive top_k; top_k range check (#779)	Ivan Stepanov
2023-04-03	Define non-positive temperature behavior (#720)	Ivan Stepanov
2023-04-02	Added api for getting/setting the kv_cache (#685)	Christian Falch
2023-04-02	ggml : change ne to int64_t (#626)	Marian Cepok
2023-04-02	llama : do not allocate KV cache for "vocab_only == true" (#682)	Stephan Walter
2023-03-30	Introduce GGML migration tool for new file format	Justine Tunney
2023-03-30	Ensure --mlock works properly with mmap() support	Justine Tunney
2023-03-30	Make loading weights 10-100x faster	Justine Tunney
2023-03-30	Initial windows support (untested)	Slaren
2023-03-30	Always initialize mm_addr and mm_length in llama_model	Slaren
2023-03-30	Unmap the file in llama_free	Slaren
2023-03-30	Make mmap_file static	Slaren
2023-03-30	Fix ggml_init_params in quantize	Slaren
2023-03-30	Add mmap support for model files	Slaren
2023-03-29	llama : fix compile warnings when reading the vocab	Georgi Gerganov
2023-03-29	llama : use the same threshold for OpenBLAS and ggml thread limiting (#577)	Maël Kerbiriou
2023-03-28	py : add temporary script to convert old ggml files to newer version (#539)	thement
2023-03-28	all : be more strict about converting float to double (#458)	Stephan Walter
2023-03-28	ggml : introduce structs for the q4 data blocks (#356)	Stephan Walter
2023-03-25	Cleanup STL headers + fix embedding examples + minor stuff	Georgi Gerganov
2023-03-25	Don't interefe with BLAS for large prompts by running only 1 thread	Georgi Gerganov