llama.cpp.git - llama.cpp

Age	Commit message (Collapse)	Author
2023-03-31	py : cleanup the code	Pavol Rusnak
	- use f-strings where possible - drop first param of encode/decode functions since "utf-8" is the default
2023-03-30	Introduce GGML migration tool for new file format	Justine Tunney
	If you deleted your old Meta LLaMA .pth files, then the migrate-ggml-2023-03-30-pr613.py script will allow you to convert your old ggml files into the new mmap()'able format. See #613
2023-03-30	Make loading weights 10-100x faster	Justine Tunney
	This is a breaking change that's going to give you three benefits: 1. Your inference commands should load 100x faster 2. You may be able to safely load models 2x larger 3. You can run many concurrent inference processes This was accomplished by changing the file format so we can mmap() weights directly into memory without having to read() or copy them thereby ensuring the kernel can make its file cache pages directly accessible to our inference processes; and secondly, that the file cache pages are much less likely to get evicted (which would force loads to hit disk) because they're no longer competing with memory pages that were needlessly created by gigabytes of standard i/o. The new file format supports single-file models like LLaMA 7b, and it also supports multi-file models like LLaMA 13B. Our Python tool now merges the foo.1, foo.2, etc. files back into a single file so that the C++ code which maps it doesn't need to reshape data every time. That's made llama.cpp so much simpler. Much of its load code has now been deleted. Furthermore, this change ensures that tensors are aligned properly on a 32-byte boundary. That opens the door to seeing if we can get additional performance gains on some microprocessors, by using ops that require memory alignment. Lastly note that both POSIX and the Windows platform are supported Fixes #91
2023-03-28	py : removed unused `model` variable and verified that the code functions ↵	DooWoong Lee (David)
	correctly with `vocab_only` setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547)
2023-03-25	Clarify console output in convert-pth-to-ggml.py (#512)	jp-x-g
	"Processing part 1 of 3" instead of "Processing part 0"
2023-03-22	Introduce C-style API (#370)	Georgi Gerganov
	* Major refactoring - introduce C-style API * Clean up * Add <cassert> * Add <iterator> * Add <algorithm> .... * Fix timing reporting and accumulation * Measure eval time only for single-token calls * Change llama_tokenize return meaning
2023-03-21	Fix convert script, warnings alpaca instructions, default params	Georgi Gerganov

2023-03-21	fix typo in comment (#318)	Mack Straight

2023-03-21	Add tokenizer test + revert to C++11 (#355)	Georgi Gerganov
	* Add test-tokenizer-0 to do a few tokenizations - feel free to expand * Added option to convert-pth-to-ggml.py script to dump just the vocabulary * Added ./models/ggml-vocab.bin containing just LLaMA vocab data (used for tests) * Added utility to load vocabulary file from previous point (temporary implementation) * Avoid using std::string_view and drop back to C++11 (hope I didn't break something) * Rename gpt_vocab -> llama_vocab * All CMake binaries go into ./bin/ now
2023-03-20	Fixed tokenizer.model not found error when model dir is symlink (#325)	Qingyou Meng

2023-03-20	sentencepiece bpe compatible tokenizer (#252)	Mack Straight
	* potential out of bounds read * fix quantize * style * Update convert-pth-to-ggml.py * mild cleanup * don't need the space-prefixing here rn since main.cpp already does it * new file magic + version header field * readme notice * missing newlines Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
2023-03-19	Fix python stuff (#109)	Georgi Gerganov

2023-03-19	Refactoring `convert-pth-to-ggml.py`: more concise and readable (#109)	qunash
	* Refactor get_n_parts function to simplify code and improve readability * Use f-strings instead of concatenation * Refactoring: more concise and readable * modularize --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-17	🚀 Dockerize llamacpp (#132)	Bernat Vadell
	* feat: dockerize llamacpp * feat: split build & runtime stages * split dockerfile into main & tools * add quantize into tool docker image * Update .devops/tools.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add docker action pipeline * change CI to publish at github docker registry * fix name runs-on macOS-latest is macos-latest (lowercase) * include docker versioned images * fix github action docker * fix docker.yml * feat: include all-in-one command tool & update readme.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-15	Use `tokenizer.vocab_size()` instead of hardcoding 32000 in ↵	Ronsor
	convert-pth-to-ggml.py (#142) There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.
2023-03-13	Fix UTF-8 handling (including colors) (#79)	Val Kharitonov

2023-03-12	Revert "weights_only" arg - this causing more trouble than help	Georgi Gerganov

2023-03-12	python/pytorch compat notes (#44)	Oleksandr Nikitin

2023-03-12	use weights_only in conversion script (#32)	deepdiffuser
	this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries
2023-03-11	Support all LLaMA models + change Q4_0 quantization storage	Georgi Gerganov

2023-03-10	Fix a bug in the rope calculation	Georgi Gerganov

2023-03-10	Initial release	Georgi Gerganov