llama.cpp.git - llama.cpp

Age	Commit message (Expand)	Author
2023-05-23	OpenCL Token Generation Acceleration (#1459)	0cc4m
2023-05-21	ggml : output 3d sizes in ggml_graph_dump_dot()	Georgi Gerganov
2023-05-20	ggml : update WASM SIMD	Georgi Gerganov
2023-05-20	ggml : add ggml_clamp() (#1539)	Georgi Gerganov
2023-05-20	cuda : loading models directly into VRAM, norm calculation on GPU, broadcasti...	Johannes Gäßler
2023-05-20	llama : fix name shadowing and C4146 (#1526)	Maxime
2023-05-20	ggml : fix scalar implementation of Q4_1 dot	Georgi Gerganov
2023-05-19	ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)	Georgi Gerganov
2023-05-16	~7% faster Q5_1 AVX2 code (#1477)	Ilya Kurdyukov
2023-05-14	ggml : alternative fix for race condition bug in non-inplace ggml_compute_for...	xaedes
2023-05-14	ggml : various fixes (#1450)	Georgi Gerganov
2023-05-14	ggml : add AVX support based on AVX2 code (#1430)	katsu560
2023-05-13	ggml : multi-thread mul and diag_mask ops (#1428)	Georgi Gerganov
2023-05-13	ggml : GPU-accelerated token generation (#1412)	Johannes Gäßler
2023-05-13	ggml : implement backward pass for llama + small training-llama-from-scratch ...	xaedes
2023-05-13	ggml : sync alibi fix from ggml repo	Georgi Gerganov
2023-05-13	Adding SSE instructions to ggml_vec_dot_q4_0_q8_0 (#1413)	3ooabkhxtn
2023-05-12	ggml : remove bit shuffling (#1405)	Georgi Gerganov
2023-05-09	use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler (#1314)	Sami Farin
2023-05-06	ggml : Allow usage of CLBlast alongside Accelerate.framework (#1336)	swittk
2023-05-04	ggml : change immintrin.h to intrin.h for compatibility (#1307)	Ron Jailall
2023-05-03	ggml : vectorize Q8_0 quantization	Georgi Gerganov
2023-05-02	ggml : fix 32-bit ARM	Georgi Gerganov
2023-05-02	ggml : fix ppc64le build error and make cmake detect Power processors (#1284)	Marvin Gießing
2023-05-02	ggml: add names to tensors (#1268)	slaren
2023-05-01	cuBLAS: refactor and optimize f16 mat mul performance (#1259)	slaren
2023-05-01	ggml : fix ggml_used_mem() (#1264)	Kerfuffle
2023-04-30	ggml : fix UB (int << 31)	Georgi Gerganov
2023-04-30	ggml : add Q5 WASM SIMD + GGML_FTYPE	Georgi Gerganov
2023-04-30	ggml : fix labels for GGML_OP_ALIBI	Georgi Gerganov
2023-04-29	ggml : fix 32-bit ARM NEON	Georgi Gerganov
2023-04-29	ggml : use vzip instead of vuzp for consistency	Georgi Gerganov
2023-04-29	ggml : fix visibility and unused warnings	Georgi Gerganov
2023-04-29	ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229)	Georgi Gerganov
2023-04-29	ggml : adjust mul_mat_f16 work memory (#1226)	Georgi Gerganov
2023-04-29	cuBLAS: use host pinned memory and dequantize while copying (#1207)	slaren
2023-04-29	cuBLAS: non-contiguous tensor support (#1215)	Henri Vasserman
2023-04-28	Remove Q4_3 which is no better than Q5 (#1218)	Stephan Walter
2023-04-28	ggml : sync ggml (ggml_alibi)	Georgi Gerganov
2023-04-28	ggml : add helper debug printf in soft_max	Georgi Gerganov
2023-04-28	ggml : add CLBlast support (#1164)	0cc4m
2023-04-28	add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211)	Yann Follet
2023-04-26	ggml : slightly faster AVX2 implementation for Q5 (#1197)	Stephan Walter
2023-04-26	ggml : add Q5_0 and Q5_1 quantization (#1187)	Georgi Gerganov
2023-04-25	ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (...	Georgi Gerganov
2023-04-25	ggml : use full range for Q4_0 and Q4_2 quantization (#729)	unbounded
2023-04-24	ggml : fix bug in ggml_compute_forward_sum_f32 (#1162)	xaedes
2023-04-24	Fix build for gcc 8 and test in CI (#1154)	Stephan Walter
2023-04-23	ggml : do not print perf ops that have not been used at all	Georgi Gerganov
2023-04-23	ggml : better PERF prints + support "LLAMA_PERF=1 make"	Georgi Gerganov