llama.cpp.git - llama.cpp

Age	Commit message (Expand)	Author
2023-06-25	ggml : sync latest ggml (custom operators)	Georgi Gerganov
2023-06-24	ggml : improve ggml_graph_dump_dot, add ggml_format_name (#1978)	slaren
2023-06-19	ggml : sync latest ggml repo (#1924)	Georgi Gerganov
2023-06-18	metal : handle buffers larger than device's maxBufferLength (#1826)	Georgi Gerganov
2023-06-14	CUDA full GPU acceleration, KV cache in VRAM (#1827)	Johannes Gäßler
2023-06-13	train : improved training-from-scratch example (#1652)	xaedes
2023-06-06	Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703)	Johannes Gäßler
2023-06-05	ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684)	Kawrakow
2023-06-04	llama : Metal inference (#1642)	Georgi Gerganov
2023-05-29	ggml : sync cgraph import / export API	Georgi Gerganov
2023-05-27	ggml : add ggml_tensor_overhead()	Georgi Gerganov
2023-05-27	ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name())	Georgi Gerganov
2023-05-23	OpenCL Token Generation Acceleration (#1459)	0cc4m
2023-05-20	ggml : add ggml_clamp() (#1539)	Georgi Gerganov
2023-05-19	ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)	Georgi Gerganov
2023-05-14	ggml : various fixes (#1450)	Georgi Gerganov
2023-05-14	ggml : add GGML_QNT_VERSION to track quantization format changes	Georgi Gerganov
2023-05-13	ggml : GPU-accelerated token generation (#1412)	Johannes Gäßler
2023-05-13	ggml : implement backward pass for llama + small training-llama-from-scratch ...	xaedes
2023-05-12	ggml : remove bit shuffling (#1405)	Georgi Gerganov
2023-05-02	ggml: add names to tensors (#1268)	slaren
2023-05-01	cuBLAS: refactor and optimize f16 mat mul performance (#1259)	slaren
2023-04-30	ggml : add Q5 WASM SIMD + GGML_FTYPE	Georgi Gerganov
2023-04-29	ggml : fix visibility and unused warnings	Georgi Gerganov
2023-04-28	Remove Q4_3 which is no better than Q5 (#1218)	Stephan Walter
2023-04-28	ggml : sync ggml (ggml_alibi)	Georgi Gerganov
2023-04-28	ggml : add CLBlast support (#1164)	0cc4m
2023-04-26	ggml : add Q5_0 and Q5_1 quantization (#1187)	Georgi Gerganov
2023-04-25	ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (...	Georgi Gerganov
2023-04-24	ggml : export symbols (#1155)	Georgi Gerganov
2023-04-20	ggml : sync ggml (add GPT-NeoX RoPE implementation)	Georgi Gerganov
2023-04-20	llama : multi-threaded quantization (#1075)	Kawrakow
2023-04-20	ggml : add Q4_3 quantization (#1082)	Georgi Gerganov
2023-04-19	Add NVIDIA cuBLAS support (#1044)	slaren
2023-04-18	ggml : add new Q4_2 quantization (ARM only) (#1046)	Georgi Gerganov
2023-04-17	Add LoRA support (#820)	slaren
2023-04-17	Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933)	Ivan Komarov
2023-04-15	ggml : add Q8_0 quantization for intermediate results (#951)	Georgi Gerganov
2023-04-14	Expose type name from ggml (#970)	Pavol Rusnak
2023-04-14	ggml : add unary and binary map operations (#874)	Kerfuffle
2023-04-13	ggml : add GGML_DEFAULT_N_THREADS	Georgi Gerganov
2023-04-11	Add enum llama_ftype, sync ggml_type to model files (#709)	Stephan Walter
2023-04-10	ggml : add ggml_cont() + optimize ggml_cpy() for contiguous dst	Georgi Gerganov
2023-04-10	Rewrite loading code to try to satisfy everyone:	comex
2023-04-08	Add quantize-stats command for testing quantization (#728)	unbounded
2023-04-05	ggml, llama : avoid heavy V transpose + improvements (#775)	Georgi Gerganov
2023-04-02	ggml : change ne to int64_t (#626)	Marian Cepok
2023-03-30	Ensure --mlock works properly with mmap() support	Justine Tunney
2023-03-30	Add mmap support for model files	Slaren
2023-03-28	ggml : introduce structs for the q4 data blocks (#356)	Stephan Walter