aboutsummaryrefslogtreecommitdiff
path: root/utils.cpp
AgeCommit message (Collapse)Author
2023-03-23Command line args bounds checking (#424)anzz1
* command line args bounds checking * unknown and invalid param exit codes 0 -> 1
2023-03-22Don't force immediate interactive without `-i` (#354)tjohnman
* Don't force immediate interactive without -i Sometimes we might want to use a reverse prompt but we want to let the model generate tokens right after the initial prompt. So we don't force user input mode if the -i flag wasn't specified and instead let it run until we encounter the reverse prompt. This gives use some more flexibility, since it doesn't force the user to enter a newline if they want to let the model generate text right after the initial prompt and only be asked for input if the reverse prompt is encountered. The `--interactive-first` flag is reintroduced to force the old behavior. `-r` behaves like `-i` plus introduces a reverse prompt (it can be specified more than once). * Update help output. --------- Co-authored-by: Johnman <tjohnman@github>
2023-03-22fix perplexity after c-api refactor (#390)Erik Scholz
* preallocate a buffer of fitting size for tokenization (utils.cpp) * don't create a new std::string (especially here, where it's usually large)
2023-03-22When seed <= 0 - use the clock to generate oneGeorgi Gerganov
2023-03-22Introduce C-style API (#370)Georgi Gerganov
* Major refactoring - introduce C-style API * Clean up * Add <cassert> * Add <iterator> * Add <algorithm> .... * Fix timing reporting and accumulation * Measure eval time only for single-token calls * Change llama_tokenize return meaning
2023-03-21We could use std::unordered_map over std::map (#305)Fabio R. Sluzala
* Improve performance by changing std::map to std::unordered_map and std::map<id, token> id_to_token; to std::vector<token> id_to_token; * fix last commit on gpt_vocab_init add vocab.id_to_token.resize(vocab.token_to_id.size()); * Removed include <map> * Nest struct token score inside gpt_vocab * renamed token to tok
2023-03-21Compute perplexity over prompt (#270)Gary Linscott
* Compute perplexity over prompt * More accurate perplexity calculation - over all logits in the context window (so 512x more tokens!) * Output all perplexitiies * Add timing/ETA
2023-03-21Add OpenBSD support (#314)Kevin Lo
2023-03-21cmdline option for custom amount of model parts (--n_parts N) (#348)anzz1
* cmdline option for custom amount of model parts (--n_parts N) * Update main.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-21Add tokenizer test + revert to C++11 (#355)Georgi Gerganov
* Add test-tokenizer-0 to do a few tokenizations - feel free to expand * Added option to convert-pth-to-ggml.py script to dump just the vocabulary * Added ./models/ggml-vocab.bin containing just LLaMA vocab data (used for tests) * Added utility to load vocabulary file from previous point (temporary implementation) * Avoid using std::string_view and drop back to C++11 (hope I didn't break something) * Rename gpt_vocab -> llama_vocab * All CMake binaries go into ./bin/ now
2023-03-20sentencepiece bpe compatible tokenizer (#252)Mack Straight
* potential out of bounds read * fix quantize * style * Update convert-pth-to-ggml.py * mild cleanup * don't need the space-prefixing here rn since main.cpp already does it * new file magic + version header field * readme notice * missing newlines Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
2023-03-19Support for multiple reverse prompts. (#299)tjohnman
Co-authored-by: Johnman <> Co-authored-by: Johnman <tjohnman@github>
2023-03-19Make prompt randomization optional. (#300)tjohnman
Co-authored-by: Johnman <>
2023-03-19Add --ignore-eos parameter (#181)slaren
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-19Command line switch to use F16 for memory_k and memory_v (refactor of #154) ↵Erik Scholz
(#294) * Use F16 for memory_k and memory_v * add command line switch to use f16 instead of f32 for memory k+v --------- Co-authored-by: Ty Everett <ty@tyweb.us>
2023-03-19Drop trailing new line from file prompts (#80)Georgi Gerganov
2023-03-19Add "--instruct" argument for usage with Alpaca (#240)Georgi Gerganov
Also start adding prompts in "./prompts"
2023-03-18Fix n^2 loop in tokenization (#254)Gary Linscott
This causes long prompts to parse very slowly.
2023-03-17Implement non-greedy tokenizer that tries to maximize token lengths (#242)thement
* Implement non-greedy tokenizer that tries to maximize token lengths * Insert single space in front of the prompt - this is to match original llama tokenizer behavior --------- Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>
2023-03-17Don't tell users to use a bad number of threads (#243)Stephan Walter
The readme tells people to use the command line option "-t 8", causing 8 threads to be started. On systems with fewer than 8 cores, this causes a significant slowdown. Remove the option from the example command lines and use /proc/cpuinfo on Linux to determine a sensible default.
2023-03-17Q4_1 quantization (#193)Matvey Soloviev
* Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix #152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul
2023-03-15added ctx_size parameter (#148)Justin Suess
* added ctx_size parameter * added it in more places * Apply suggestions from code review --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-13Add NetBSD support. (#90)Thomas Klausner
2023-03-12Add interactive mode (#61)Matvey Soloviev
* Initial work on interactive mode. * Improve interactive mode. Make rev. prompt optional. * Update README to explain interactive mode. * Fix OS X build
2023-03-12Allow using prompt files (#59)Ben Garney
2023-03-12Add back top_k (#56)beiller
* Add back top_k * Update utils.cpp * Update utils.h --------- Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-12Windows fixes (#31)Sebastián A
* Apply fixes suggested to build on windows Issue: https://github.com/ggerganov/llama.cpp/issues/22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.
2023-03-12Add repetition penalty (#20)beiller
* Adding repeat penalization * Update utils.h * Update utils.cpp * Numeric fix Should probably still scale by temp even if penalized * Update comments, more proper application I see that numbers can go negative so a fix from a referenced commit * Minor formatting --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-11Support all LLaMA models + change Q4_0 quantization storageGeorgi Gerganov
2023-03-11Add missing headers for memcpy and assert (#3)Jean-Michaël Celerier
2023-03-10Fix a bug in the rope calculationGeorgi Gerganov
2023-03-10Final touchesGeorgi Gerganov
2023-03-10Initial releaseGeorgi Gerganov