llama.cpp.git - llama.cpp

Age	Commit message (Collapse)	Author
2023-03-17	🚀 Dockerize llamacpp (#132)	Bernat Vadell
	* feat: dockerize llamacpp * feat: split build & runtime stages * split dockerfile into main & tools * add quantize into tool docker image * Update .devops/tools.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add docker action pipeline * change CI to publish at github docker registry * fix name runs-on macOS-latest is macos-latest (lowercase) * include docker versioned images * fix github action docker * fix docker.yml * feat: include all-in-one command tool & update readme.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-17	Q4_1 quantization (#193)	Matvey Soloviev
	* Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix #152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul
2023-03-16	Update README.md	Georgi Gerganov

2023-03-16	Expand "Contributing" section	Georgi Gerganov

2023-03-16	Update hot topics - RMSnorm	Georgi Gerganov

2023-03-15	Fix RMS norm in GGML (#191)	Nebula

2023-03-16	Add RMS norm and use it (#187)	hoangmit
	* add ggml_rms_norm * update op num
2023-03-15	fixed typo (#178)	moritzbrantner

2023-03-15	add SIGINT support for _WIN32 environments (#120)	Rickey Bowers Jr
	* add SIGINT support for _WIN32 environments * perhaps more consistent
2023-03-15	added ctx_size parameter (#148)	Justin Suess
	* added ctx_size parameter * added it in more places * Apply suggestions from code review --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-15	fixed color reset on exit (#149)	Justin Suess
	* fixed color reset on exit * added sigint handler for ansi_color_reset * Update main.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-15	Fix potential licensing issue (#126)	Musab Gultekin
	* Update README.md * Update README.md remove facebook
2023-03-15	Use `tokenizer.vocab_size()` instead of hardcoding 32000 in ↵	Ronsor
	convert-pth-to-ggml.py (#142) There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.
2023-03-15	inline -> static inline for "bytesFromNibbles" (#161)	hoangmit
	Without "static" prefix, it fails to compile in clang
2023-03-14	Don't use vdotq_s32 if it's not available (#139)	Ronsor
	* Don't use vdotq_s32 if it's not available `dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available. Reintroduces the code removed in 84d9015 if `__ARM_FEATURE_DOTPROD` isn't defined. * Update ggml.c --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-14	Add section to README on how to run the project on Android (#130)	Radoslav Gerganov

2023-03-14	Add Misc section + update hot topics + minor fixes	Georgi Gerganov

2023-03-13	Add windows to the CI (#98)	Sebastián A

2023-03-13	CMake build in Release by default (#75)	Georgi Gerganov

2023-03-13	Update contribution section, hot topics, limitations, etc.	Georgi Gerganov

2023-03-13	Print system information	Georgi Gerganov

2023-03-13	Initial support for CMake (#75)	Sebastián A

2023-03-13	Add NetBSD support. (#90)	Thomas Klausner

2023-03-13	Use fprintf for diagnostic output (#48)	Pavol Rusnak
	keep printf only for printing model output one can now use ./main ... 2>dev/null to suppress any diagnostic output
2023-03-13	Use vdotq_s32 to improve performance (#67)	Georgi Gerganov
	* 10% performance boost on ARM * Back to original change
2023-03-13	Reduce model loading time (#43)	uint256_t
	* Use buffering * Use vector * Minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-13	Fix UTF-8 handling (including colors) (#79)	Val Kharitonov

2023-03-13	Add quantize script for batch quantization (#92)	Pavol Rusnak
	* Add quantize script for batch quantization * Indentation * README for new quantize.sh * Fix script name * Fix file list on Mac OS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-13	Add initial contribution guidelines	Georgi Gerganov

2023-03-13	Gate signal support on being on a unixoid system. (#74)	Matvey Soloviev

2023-03-13	Fix token count accounting	Matvey Soloviev

2023-03-13	Revert "10% performance boost on ARM"	Georgi Gerganov
	This reverts commit 113a9e83ebc0f788f861394437087bf3ca0e019b. There are some reports for illegal instruction. Moved this stuff to vdotq_s32 branch until resolve
2023-03-13	Check for vdotq_s32 availability	Georgi Gerganov

2023-03-13	Ammend to previous commit - forgot to update non-QRDMX branch	Georgi Gerganov

2023-03-13	10% performance boost on ARM	Georgi Gerganov

2023-03-13	Fix color getting reset before prompt output done (#65)	Matvey Soloviev
	(cherry picked from commit 7eb2987619feee04c40eff69b604017d09919cb6)
2023-03-12	Update README.md	Georgi Gerganov

2023-03-12	Add interactive mode (#61)	Matvey Soloviev
	* Initial work on interactive mode. * Improve interactive mode. Make rev. prompt optional. * Update README to explain interactive mode. * Fix OS X build
2023-03-12	Fix typo in README (#45)	Marc Köhlbrugge

2023-03-12	Allow using prompt files (#59)	Ben Garney

2023-03-12	Add back top_k (#56)	beiller
	* Add back top_k * Update utils.cpp * Update utils.h --------- Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-12	Windows fixes (#31)	Sebastián A
	* Apply fixes suggested to build on windows Issue: https://github.com/ggerganov/llama.cpp/issues/22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.
2023-03-12	Update README.md	Georgi Gerganov

2023-03-12	Add CI (#60)	Georgi Gerganov

2023-03-12	Revert "weights_only" arg - this causing more trouble than help	Georgi Gerganov

2023-03-12	python/pytorch compat notes (#44)	Oleksandr Nikitin

2023-03-12	Add repetition penalty (#20)	beiller
	* Adding repeat penalization * Update utils.h * Update utils.cpp * Numeric fix Should probably still scale by temp even if penalized * Update comments, more proper application I see that numbers can go negative so a fix from a referenced commit * Minor formatting --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-12	Clarify meaning of hacking	Georgi Gerganov

2023-03-12	README: add "Supported platforms" + update hot topics	Georgi Gerganov

2023-03-12	use weights_only in conversion script (#32)	deepdiffuser
	this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries