llama.cpp.git - llama.cpp

Age	Commit message (Expand)	Author
2023-07-24	make rms_norm_eps a parameter (#2374)	slaren
2023-07-24	ggml : sync (unary ops refactor, static-correctness) (#2370)	Georgi Gerganov
2023-07-23	ggml: move op parameters from tensors to ggml_tensor::op_params (#2333)	slaren
2023-07-21	ggml : fix rope args order + assert (#2054)	Georgi Gerganov
2023-07-15	llama : add custom RoPE (#2054)	Xiao-Yong Jin
2023-07-12	ggml : add ggml_pool_1d and ggml_pool_2d	Georgi Gerganov
2023-07-11	ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183)	Georgi Gerganov
2023-07-11	ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)	Spencer Sutton
2023-07-07	ggml : change ggml_graph_compute() API to not require context (#1999)	Qingyou Meng
2023-07-06	ggml : fix restrict usage	Georgi Gerganov
2023-07-05	ggml : generalize `quantize_fns` for simpler FP16 handling (#1237)	Stephan Walter
2023-07-04	ggml : sync latest (new ops, macros, refactoring) (#2106)	Georgi Gerganov
2023-07-01	ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995)	Qingyou Meng
2023-06-27	ggml : add support for ChatGLM RoPE	Georgi Gerganov
2023-06-26	ggml : increase max tensor name + clean up compiler warnings in train-text (#...	David Yang
2023-06-26	ggml : add NUMA support (#1556)	zrm
2023-06-25	ggml : sync latest ggml (custom operators)	Georgi Gerganov
2023-06-24	ggml : improve ggml_graph_dump_dot, add ggml_format_name (#1978)	slaren
2023-06-19	ggml : sync latest ggml repo (#1924)	Georgi Gerganov
2023-06-18	metal : handle buffers larger than device's maxBufferLength (#1826)	Georgi Gerganov
2023-06-14	CUDA full GPU acceleration, KV cache in VRAM (#1827)	Johannes Gäßler
2023-06-13	train : improved training-from-scratch example (#1652)	xaedes
2023-06-06	Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703)	Johannes Gäßler
2023-06-05	ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684)	Kawrakow
2023-06-04	llama : Metal inference (#1642)	Georgi Gerganov
2023-05-29	ggml : sync cgraph import / export API	Georgi Gerganov
2023-05-27	ggml : add ggml_tensor_overhead()	Georgi Gerganov
2023-05-27	ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name())	Georgi Gerganov
2023-05-23	OpenCL Token Generation Acceleration (#1459)	0cc4m
2023-05-20	ggml : add ggml_clamp() (#1539)	Georgi Gerganov
2023-05-19	ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)	Georgi Gerganov
2023-05-14	ggml : various fixes (#1450)	Georgi Gerganov
2023-05-14	ggml : add GGML_QNT_VERSION to track quantization format changes	Georgi Gerganov
2023-05-13	ggml : GPU-accelerated token generation (#1412)	Johannes Gäßler
2023-05-13	ggml : implement backward pass for llama + small training-llama-from-scratch ...	xaedes
2023-05-12	ggml : remove bit shuffling (#1405)	Georgi Gerganov
2023-05-02	ggml: add names to tensors (#1268)	slaren
2023-05-01	cuBLAS: refactor and optimize f16 mat mul performance (#1259)	slaren
2023-04-30	ggml : add Q5 WASM SIMD + GGML_FTYPE	Georgi Gerganov
2023-04-29	ggml : fix visibility and unused warnings	Georgi Gerganov
2023-04-28	Remove Q4_3 which is no better than Q5 (#1218)	Stephan Walter
2023-04-28	ggml : sync ggml (ggml_alibi)	Georgi Gerganov
2023-04-28	ggml : add CLBlast support (#1164)	0cc4m
2023-04-26	ggml : add Q5_0 and Q5_1 quantization (#1187)	Georgi Gerganov
2023-04-25	ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (...	Georgi Gerganov
2023-04-24	ggml : export symbols (#1155)	Georgi Gerganov
2023-04-20	ggml : sync ggml (add GPT-NeoX RoPE implementation)	Georgi Gerganov
2023-04-20	llama : multi-threaded quantization (#1075)	Kawrakow
2023-04-20	ggml : add Q4_3 quantization (#1082)	Georgi Gerganov
2023-04-19	Add NVIDIA cuBLAS support (#1044)	slaren