llama.cpp.git - llama.cpp

Age	Commit message (Expand)	Author
2023-08-09	CUDA: tuned mul_mat_q kernels (#2546)	Johannes Gäßler
2023-08-08	Allow passing grammar to completion endpoint (#2532)	Martin Krasser
2023-08-07	[Makefile] Move ARM CFLAGS before compilation (#2536)	GiviMAD
2023-08-04	Add --simple-io option for subprocesses and break out console.h and cpp (#1558)	DannyDaemonic
2023-08-02	tests : Fix compilation warnings (Linux/GCC) (#2451)	Eve
2023-07-31	CUDA: fixed LLAMA_FAST compilation option (#2473)	Johannes Gäßler
2023-07-31	CUDA: mmq CLI option, fixed mmq build issues (#2453)	Johannes Gäßler
2023-07-30	ggml : add graph tensor allocator (#2411)	slaren
2023-07-29	CUDA: Quantized matrix matrix multiplication (#2160)	Johannes Gäßler
2023-07-26	make : build with -Wmissing-prototypes (#2394)	Cebtenzzre
2023-07-24	Chat UI extras (#2366)	Aarni Koskela
2023-07-23	llama : add grammar-based sampling (#1773)	Evan Jones
2023-07-23	make : fix CLBLAST compile support in FreeBSD (#2331)	Jose Maldonado
2023-07-21	gitignore : changes for Poetry users + chat examples (#2284)	Jose Maldonado
2023-07-21	make : fix indentation	Georgi Gerganov
2023-07-21	make : support customized LLAMA_CUDA_NVCC and LLAMA_CUDA_CCBIN (#2275)	Sky Yan
2023-07-21	make : add new target for test binaries (#2244)	Jiří Podivín
2023-07-21	make : fix embdinput library and server examples building on MSYS2 (#2235)	Przemysław Pawełczyk
2023-07-14	make : use pkg-config for OpenBLAS (#2222)	wzy
2023-07-14	make : fix combination of LLAMA_METAL and LLAMA_MPI (#2208)	James Reynolds
2023-07-10	mpi : add support for distributed inference via MPI (#2099)	Evan Miller
2023-07-07	docker : add support for CUDA in docker (#1461)	dylan
2023-07-05	Quantized dot products for CUDA mul mat vec (#2067)	Johannes Gäßler
2023-07-04	Allow old Make to build server. (#2098)	Henri Vasserman
2023-07-04	Update Makefile: clean simple (#2097)	ZhouYuChen
2023-06-28	llama : support input embeddings directly (#1910)	ningshanwutuobang
2023-06-26	k-quants : support for super-block size of 64 (#2001)	Kawrakow
2023-06-19	Convert vector to f16 for dequantize mul mat vec (#1913)	Johannes Gäßler
2023-06-18	metal : handle buffers larger than device's maxBufferLength (#1826)	Georgi Gerganov
2023-06-17	make : do not print help for simple example	Georgi Gerganov
2023-06-17	make : update for latest Arch (#1701)	DaniAndTheWeb
2023-06-17	Server Example Refactor and Improvements (#1570)	Randall Fitzgerald
2023-06-16	examples : add "simple" (#1840)	SuperUserNameMan
2023-06-16	CUDA : faster k-quant dot kernels (#1862)	Kawrakow
2023-06-15	make : add train-text-from-scratch (#1850)	daboe01
2023-06-15	make : clean *.so files (#1857)	sandyiscool
2023-06-13	Allow "quantizing" to f16 and f32 (#1787)	Kerfuffle
2023-06-10	make : add SSSE3 compilation use case (#1659)	rankaiyx
2023-06-07	k-quants : allow to optionally disable at compile time (#1734)	Georgi Gerganov
2023-06-06	ggml : fix builds, add ggml-quants-k.o (close #1712, close #1710)	Georgi Gerganov
2023-06-05	ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684)	Kawrakow
2023-06-04	llama : Metal inference (#1642)	Georgi Gerganov
2023-05-28	LLAMA_DEBUG adds debug symbols (#1617)	Johannes Gäßler
2023-05-27	Include server in releases + other build system cleanups (#1610)	Kerfuffle
2023-05-26	cuda : performance optimizations (#1530)	Johannes Gäßler
2023-05-23	OpenCL Token Generation Acceleration (#1459)	0cc4m
2023-05-21	make : .PHONY clean (#1553)	Stefan Sydow
2023-05-20	feature : support blis and other blas implementation (#1536)	Zenix
2023-05-20	Revert "feature : add blis and other BLAS implementation support (#1502)"	Georgi Gerganov
2023-05-20	feature : add blis and other BLAS implementation support (#1502)	Zenix