llama.cpp.git - llama.cpp

Age	Commit message (Expand)	Author
2023-07-14	ggml : fix static_assert with older compilers #2024 (#2218)	Evan Miller
2023-07-14	llama : add functions that work directly on model (#2197)	Bach Le
2023-07-14	build.zig : install config header (#2216)	Ali Chraghi
2023-07-14	examples : fixed path typos in embd-input (#2214)	Shangning Xu
2023-07-14	cuda : support broadcast add & mul (#2192)	Jiahao Li
2023-07-14	CUDA: mul_mat_vec_q kernels for k-quants (#2203)	Johannes Gäßler
2023-07-14	make : fix combination of LLAMA_METAL and LLAMA_MPI (#2208)	James Reynolds
2023-07-14	ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope)	Georgi Gerganov
2023-07-14	Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212)	Kawrakow
2023-07-13	Revert "Support using mmap when applying LoRA (#2095)" (#2206)	Howard Su
2023-07-13	Fix compile error on Windows CUDA (#2207)	Howard Su
2023-07-13	devops : add missing quotes to bash script (#2193)	Bodo Graumann
2023-07-12	metal : new q4_0 matrix-vector kernel (#2188)	Shouzheng Liu
2023-07-12	ggml : broadcast mul_mat + conv batch support (#2199)	Georgi Gerganov
2023-07-12	ggml : add ggml_pool_1d and ggml_pool_2d	Georgi Gerganov
2023-07-12	cuda : add gelu support	Georgi Gerganov
2023-07-12	FP16 is supported in CM=6.0 (#2177)	Howard Su
2023-07-12	Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189)	Johannes Gäßler
2023-07-12	ggml : revert CUDA broadcast changes from #2183 (#2191)	Georgi Gerganov
2023-07-11	ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183)	Georgi Gerganov
2023-07-11	ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)	Spencer Sutton
2023-07-11	llama : add classifier-free guidance (#2135)	Bach Le
2023-07-11	docker : add '--server' option (#2174)	Jinwoo Jeong
2023-07-11	readme : fix zig build instructions (#2171)	Chad Brewbaker
2023-07-11	Support using mmap when applying LoRA (#2095)	Howard Su
2023-07-11	Possible solution to allow K-quants on models with n_vocab!=32000 (#2148)	LostRuins
2023-07-10	mpi : add support for distributed inference via MPI (#2099)	Evan Miller
2023-07-09	llama : remove "first token must be BOS" restriction (#2153)	oobabooga
2023-07-09	main : escape prompt prefix/suffix (#2151)	Nigel Bosch
2023-07-09	readme : update Termux instructions (#2147)	JackJollimore
2023-07-09	ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (#2104) (#2115)	clyang
2023-07-09	readme : add more docs indexes (#2127)	rankaiyx
2023-07-08	Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144)	Johannes Gäßler
2023-07-08	CUDA: add __restrict__ to mul mat vec kernels (#2140)	Johannes Gäßler
2023-07-07	docker : add support for CUDA in docker (#1461)	dylan
2023-07-07	ci : switch threads to 1 (#2138)	Georgi Gerganov
2023-07-07	ggml : change ggml_graph_compute() API to not require context (#1999)	Qingyou Meng
2023-07-07	ggml : remove sched_yield() call in ggml_graph_compute_thread() (#2134)	Georgi Gerganov
2023-07-07	convert.py: add mapping for safetensors bf16 (#1598)	Aarni Koskela
2023-07-07	Fix opencl by wrap #if-else-endif with \n (#2086)	Howard Su
2023-07-06	ggml : fix restrict usage	Georgi Gerganov
2023-07-06	convert : update for baichuan (#2081)	Judd
2023-07-06	alpaca.sh : update model file name (#2074)	tslmy
2023-07-05	Expose generation timings from server & update completions.js (#2116)	Tobias Lütke
2023-07-05	Update Server Instructions (#2113)	Jesse Jojo Johnson
2023-07-05	ggml : fix bug introduced in #1237	Georgi Gerganov
2023-07-05	tests : fix test-grad0	Georgi Gerganov
2023-07-05	ggml : generalize `quantize_fns` for simpler FP16 handling (#1237)	Stephan Walter
2023-07-05	Update server instructions for web front end (#2103)	Jesse Jojo Johnson
2023-07-05	Quantized dot products for CUDA mul mat vec (#2067)	Johannes Gäßler