llama.cpp.git - llama.cpp

Age	Commit message (Collapse)	Author
2023-03-22	Deduplicate q4 quantization functions (#383)	Stephan Walter
	* Deduplicate q4 quantization functions * Use const; add basic test * Re-enable quantization test * Disable AVX2 flags in CI --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-22	fix: add POSIX functionality for Linux compilation (#51)	Valentyn Bezshapkin
	* fix: add POSIX functionality for Linux compilation * fix: older standard for compatibility
2023-03-22	Introduce C-style API (#370)	Georgi Gerganov
	* Major refactoring - introduce C-style API * Clean up * Add <cassert> * Add <iterator> * Add <algorithm> .... * Fix timing reporting and accumulation * Measure eval time only for single-token calls * Change llama_tokenize return meaning
2023-03-21	Add OpenBSD support (#314)	Kevin Lo

2023-03-21	Add initial AVX512 support for dot product on Linux (#320)	Casey Primozic
	* Update Makefile to detect AVX512 support and add compiler flags if it's available * Based on existing AVX2 implementation, dot product on one 32-value block of 4-bit quantized ints at a time * Perform 8 bit -> 16 bit sign extension and multiply+add on 32 values at time instead of 16 * Use built-in AVX512 horizontal reduce add to get sum at the end * Manual unrolling on inner dot product loop to reduce loop counter overhead
2023-03-19	Change RMSNorm eps to 1e-6 (#173)	Georgi Gerganov
	I think this is what is used in the Python code
2023-03-17	Don't tell users to use a bad number of threads (#243)	Stephan Walter
	The readme tells people to use the command line option "-t 8", causing 8 threads to be started. On systems with fewer than 8 cores, this causes a significant slowdown. Remove the option from the example command lines and use /proc/cpuinfo on Linux to determine a sensible default.
2023-03-17	Q4_1 quantization (#193)	Matvey Soloviev
	* Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix #152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul
2023-03-15	Fix RMS norm in GGML (#191)	Nebula

2023-03-16	Add RMS norm and use it (#187)	hoangmit
	* add ggml_rms_norm * update op num
2023-03-15	inline -> static inline for "bytesFromNibbles" (#161)	hoangmit
	Without "static" prefix, it fails to compile in clang
2023-03-14	Don't use vdotq_s32 if it's not available (#139)	Ronsor
	* Don't use vdotq_s32 if it's not available `dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available. Reintroduces the code removed in 84d9015 if `__ARM_FEATURE_DOTPROD` isn't defined. * Update ggml.c --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-13	Add NetBSD support. (#90)	Thomas Klausner

2023-03-13	Use vdotq_s32 to improve performance (#67)	Georgi Gerganov
	* 10% performance boost on ARM * Back to original change
2023-03-13	Revert "10% performance boost on ARM"	Georgi Gerganov
	This reverts commit 113a9e83ebc0f788f861394437087bf3ca0e019b. There are some reports for illegal instruction. Moved this stuff to vdotq_s32 branch until resolve
2023-03-13	Check for vdotq_s32 availability	Georgi Gerganov

2023-03-13	Ammend to previous commit - forgot to update non-QRDMX branch	Georgi Gerganov

2023-03-13	10% performance boost on ARM	Georgi Gerganov

2023-03-12	Windows fixes (#31)	Sebastián A
	* Apply fixes suggested to build on windows Issue: https://github.com/ggerganov/llama.cpp/issues/22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.
2023-03-11	Add AVX2 support for x86 architectures thanks to @Const-me !	Georgi Gerganov

2023-03-11	Support all LLaMA models + change Q4_0 quantization storage	Georgi Gerganov

2023-03-10	Initial release	Georgi Gerganov