llama.cpp.git - llama.cpp

Age	Commit message (Expand)	Author
2023-08-02	CUDA: faster non k-quant mul_mat_q kernels (#2483)	Johannes Gäßler
2023-08-02	CUDA: Fix models with output size != 32000 (#2480)	Johannes Gäßler
2023-07-31	CUDA: mmq CLI option, fixed mmq build issues (#2453)	Johannes Gäßler
2023-07-31	CUDA: Implemented row flattening for non-glm RoPE (#2468)	Johannes Gäßler
2023-07-31	CUDA: fewer memory bank conflicts for mul_mat_q (#2458)	Johannes Gäßler
2023-07-29	CUDA: Quantized matrix matrix multiplication (#2160)	Johannes Gäßler
2023-07-29	CUDA: faster multi GPU synchronization (#2448)	Johannes Gäßler
2023-07-25	Fix Q4_K and Q5_K for QK_K = 64 on CUDA (#2359)	Kawrakow
2023-07-24	make rms_norm_eps a parameter (#2374)	slaren
2023-07-24	ggml : sync (unary ops refactor, static-correctness) (#2370)	Georgi Gerganov
2023-07-24	Some more Q4_K and Q5_K speedup on CUDA (#2346)	Kawrakow
2023-07-23	ggml: move op parameters from tensors to ggml_tensor::op_params (#2333)	slaren
2023-07-23	llama : grouped-query attention + LLaMAv2 70B support (#2276)	Georgi Gerganov
2023-07-23	Speed up Q4_K (#2322)	Kawrakow
2023-07-22	CUDA: Fixed 7b q3_K_S with mul_mat_vec_q (#2313)	Johannes Gäßler
2023-07-21	Custom RoPE + bettter memory management for CUDA (#2295)	Kawrakow
2023-07-21	llama : make tensor_split ptr instead of array (#2272)	Georgi Gerganov
2023-07-17	Support dup & cont ops on CUDA (#2242)	Jiahao Li
2023-07-14	cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer ...	Bach Le
2023-07-14	cuda : support broadcast add & mul (#2192)	Jiahao Li
2023-07-14	CUDA: mul_mat_vec_q kernels for k-quants (#2203)	Johannes Gäßler
2023-07-14	ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope)	Georgi Gerganov
2023-07-13	Fix compile error on Windows CUDA (#2207)	Howard Su
2023-07-12	cuda : add gelu support	Georgi Gerganov
2023-07-12	Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189)	Johannes Gäßler
2023-07-12	ggml : revert CUDA broadcast changes from #2183 (#2191)	Georgi Gerganov
2023-07-11	ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183)	Georgi Gerganov
2023-07-11	ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)	Spencer Sutton
2023-07-08	Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144)	Johannes Gäßler
2023-07-08	CUDA: add __restrict__ to mul mat vec kernels (#2140)	Johannes Gäßler
2023-07-05	Quantized dot products for CUDA mul mat vec (#2067)	Johannes Gäßler
2023-07-03	Fix crash of test-tokenizer-0 under Debug build (#2064)	Howard Su
2023-07-01	Better CUDA synchronization logic (#2057)	Johannes Gäßler
2023-06-28	cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028)	Salvador E. Tropea
2023-06-28	cuda : fix missing const qualifier in casts (#2027)	Salvador E. Tropea
2023-06-28	CUDA GPU acceleration for LoRAs + f16 models (#1970)	Johannes Gäßler
2023-06-26	k-quants : support for super-block size of 64 (#2001)	Kawrakow
2023-06-26	Fix assert when free invalid cuda pointer (#2005)	Howard Su
2023-06-24	#1869 Fix null reference errors when training from scratch with CUDA (#1907)	Robyn
2023-06-19	cuda : faster k-quants on older GPUs (#1930)	Kawrakow
2023-06-19	Convert vector to f16 for dequantize mul mat vec (#1913)	Johannes Gäßler
2023-06-17	Only one CUDA stream per device for async compute (#1898)	Johannes Gäßler
2023-06-17	ggml : fix warnings under MSVC (#1908)	Howard Su
2023-06-16	CUDA : faster k-quant dot kernels (#1862)	Kawrakow
2023-06-15	Fixed CUDA runtime version check (#1879)	Johannes Gäßler
2023-06-15	Fix the validation of main device (#1872)	Howard Su
2023-06-14	CUDA full GPU acceleration, KV cache in VRAM (#1827)	Johannes Gäßler
2023-06-12	Leverage mmap for offloading tensors to GPU (#1597)	Howard Su
2023-06-11	Fixed WSL cuda's OOM error (#1594)	Kyle Liang
2023-06-09	Windows nvcc workaround (#1753)	Johannes Gäßler