aboutsummaryrefslogtreecommitdiff
path: root/convert-pth-to-ggml.py
AgeCommit message (Collapse)Author
2023-03-31py : cleanup the codePavol Rusnak
- use f-strings where possible - drop first param of encode/decode functions since "utf-8" is the default
2023-03-30Introduce GGML migration tool for new file formatJustine Tunney
If you deleted your old Meta LLaMA .pth files, then the migrate-ggml-2023-03-30-pr613.py script will allow you to convert your old ggml files into the new mmap()'able format. See #613
2023-03-30Make loading weights 10-100x fasterJustine Tunney
This is a breaking change that's going to give you three benefits: 1. Your inference commands should load 100x faster 2. You may be able to safely load models 2x larger 3. You can run many concurrent inference processes This was accomplished by changing the file format so we can mmap() weights directly into memory without having to read() or copy them thereby ensuring the kernel can make its file cache pages directly accessible to our inference processes; and secondly, that the file cache pages are much less likely to get evicted (which would force loads to hit disk) because they're no longer competing with memory pages that were needlessly created by gigabytes of standard i/o. The new file format supports single-file models like LLaMA 7b, and it also supports multi-file models like LLaMA 13B. Our Python tool now merges the foo.1, foo.2, etc. files back into a single file so that the C++ code which maps it doesn't need to reshape data every time. That's made llama.cpp so much simpler. Much of its load code has now been deleted. Furthermore, this change ensures that tensors are aligned properly on a 32-byte boundary. That opens the door to seeing if we can get additional performance gains on some microprocessors, by using ops that require memory alignment. Lastly note that both POSIX and the Windows platform are supported Fixes #91
2023-03-28py : removed unused `model` variable and verified that the code functions ↵DooWoong Lee (David)
correctly with `vocab_only` setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547)
2023-03-25Clarify console output in convert-pth-to-ggml.py (#512)jp-x-g
"Processing part 1 of 3" instead of "Processing part 0"
2023-03-22Introduce C-style API (#370)Georgi Gerganov
* Major refactoring - introduce C-style API * Clean up * Add <cassert> * Add <iterator> * Add <algorithm> .... * Fix timing reporting and accumulation * Measure eval time only for single-token calls * Change llama_tokenize return meaning
2023-03-21Fix convert script, warnings alpaca instructions, default paramsGeorgi Gerganov
2023-03-21fix typo in comment (#318)Mack Straight
2023-03-21Add tokenizer test + revert to C++11 (#355)Georgi Gerganov
* Add test-tokenizer-0 to do a few tokenizations - feel free to expand * Added option to convert-pth-to-ggml.py script to dump just the vocabulary * Added ./models/ggml-vocab.bin containing just LLaMA vocab data (used for tests) * Added utility to load vocabulary file from previous point (temporary implementation) * Avoid using std::string_view and drop back to C++11 (hope I didn't break something) * Rename gpt_vocab -> llama_vocab * All CMake binaries go into ./bin/ now
2023-03-20Fixed tokenizer.model not found error when model dir is symlink (#325)Qingyou Meng
2023-03-20sentencepiece bpe compatible tokenizer (#252)Mack Straight
* potential out of bounds read * fix quantize * style * Update convert-pth-to-ggml.py * mild cleanup * don't need the space-prefixing here rn since main.cpp already does it * new file magic + version header field * readme notice * missing newlines Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
2023-03-19Fix python stuff (#109)Georgi Gerganov
2023-03-19Refactoring `convert-pth-to-ggml.py`: more concise and readable (#109)qunash
* Refactor get_n_parts function to simplify code and improve readability * Use f-strings instead of concatenation * Refactoring: more concise and readable * modularize --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-17🚀 Dockerize llamacpp (#132)Bernat Vadell
* feat: dockerize llamacpp * feat: split build & runtime stages * split dockerfile into main & tools * add quantize into tool docker image * Update .devops/tools.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add docker action pipeline * change CI to publish at github docker registry * fix name runs-on macOS-latest is macos-latest (lowercase) * include docker versioned images * fix github action docker * fix docker.yml * feat: include all-in-one command tool & update readme.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-15Use `tokenizer.vocab_size()` instead of hardcoding 32000 in ↵Ronsor
convert-pth-to-ggml.py (#142) There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.
2023-03-13Fix UTF-8 handling (including colors) (#79)Val Kharitonov
2023-03-12Revert "weights_only" arg - this causing more trouble than helpGeorgi Gerganov
2023-03-12python/pytorch compat notes (#44)Oleksandr Nikitin
2023-03-12use weights_only in conversion script (#32)deepdiffuser
this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries
2023-03-11Support all LLaMA models + change Q4_0 quantization storageGeorgi Gerganov
2023-03-10Fix a bug in the rope calculationGeorgi Gerganov
2023-03-10Initial releaseGeorgi Gerganov