llama.cpp.git - llama.cpp

Age	Commit message (Collapse)	Author
2023-04-11	Fix whitespace, add .editorconfig, add GitHub workflow (#883)	Pavol Rusnak

2023-04-11	Add enum llama_ftype, sync ggml_type to model files (#709)	Stephan Walter

2023-04-11	Windows fixes (#890)	comex
	Mostly for msys2 and mingw64 builds, which are different from each other and different from standard Visual Studio builds. Isn't Windows fun? - Define _GNU_SOURCE in more files (it's already used in ggml.c for Linux's sake). - Don't use PrefetchVirtualMemory if not building for Windows 8 or later (mingw64 doesn't by default). But warn the user about this situation since it's probably not intended. - Check for NOMINMAX already being defined, which it is on mingw64. - Actually use the `increment` variable (bug in my `pizza` PR). - Suppress unused variable warnings in the fake pthread_create and pthread_join implementations for Windows. - (not Windows-related) Remove mention of `asprintf` from comment; `asprintf` is no longer used. Fixes #871.
2023-04-10	Rewrite loading code to try to satisfy everyone:	comex
	- Support all three formats (ggml, ggmf, ggjt). (However, I didn't include the hack needed to support GPT4All files without conversion. Those can still be used after converting them with convert.py from my other PR.) - Support both mmap and read (mmap is used by default, but can be disabled with `--no-mmap`, and is automatically disabled for pre-ggjt files or on platforms where mmap is not supported). - Support multi-file models like before, but automatically determine the number of parts rather than requiring `--n_parts`. - Improve validation and error checking. - Stop using the per-file type field (f16) entirely in favor of just relying on the per-tensor type/size fields. This has no immediate benefit, but makes it easier to experiment with different formats, and should make it easier to support the new GPTQ-for-LLaMa models in the future (I have some work in progress on that front). - Support VirtualLock on Windows (using the same `--mlock` option as on Unix). - Indicate loading progress when using mmap + mlock. (Which led me to the interesting observation that on my Linux machine, with a warm file cache, mlock actually takes some time, whereas mmap without mlock starts almost instantly...) - To help implement this, move mlock support from ggml to the loading code. - madvise/PrefetchVirtualMemory support (based on #740) - Switch from ifstream to the `fopen` family of functions to avoid unnecessary copying and, when mmap is enabled, allow reusing the same file descriptor for both metadata reads and mmap (whereas the existing implementation opens the file a second time to mmap). - Quantization now produces a single-file output even with multi-file inputs (not really a feature as much as 'it was easier this way'). Implementation notes: I tried to factor the code into more discrete pieces than before. Regarding code style: I tried to follow the code style, but I'm naughty and used a few advanced C++ features repeatedly: - Destructors to make it easier to ensure everything gets cleaned up. - Exceptions. I don't even usually use exceptions when writing C++, and I can remove them if desired... but here they make the loading code much more succinct while still properly handling a variety of errors, ranging from API calls failing to integer overflow and allocation failure. The exceptions are converted to error codes at the API boundary.) Co-authored-by: Pavol Rusnak <pavol@rusnak.io> (for the bit I copied from #740)
2023-04-08	fix for windows utf-8 input (#840)	Tomáš Pazdiora
	Use UTF-16 as input on Windows, since UTF-8 does not work and reads multibyte characters as zeros
2023-04-08	Add quantize-stats command for testing quantization (#728)	unbounded
	Command that calculates some statistics over the errors introduced by quantization, like mean square error, max error and some percentile errors for layer weights. Should be useful for testing quantization improvements. Exposes some internal state from ggml and llama for testing
2023-04-06	Do not crash when it has nothing to say. (#796)	Sergey Alirzaev
	Otherwise observing this in the interactive mode: /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/bits/stl_vector.h:1230: reference std::vector<int>::back() [_Tp = int, _Alloc = std::allocator<int>]: Assertion '!this->empty()' failed.
2023-04-05	miku.sh : add executable bit (#780)	at8u

2023-04-05	examples : add Miku.sh (#724)	at8u
	* Add Miku.sh to examples * Add missing line to prompt in Miku.sh * Add --keep param to Miku.sh * Remove '[end_of_conversation]' line from Miku.sh No longer is necessary.
2023-04-03	Windows: reactive sigint handler after each Ctrl-C (#736)	mgroeber9110

2023-04-02	examples : add gpt4all script (#658)	Leonardo Neumann

2023-04-02	fix default params for examples/main (#697)	Murilo Santana

2023-04-01	Show error message when -f fails	Slaren

2023-03-30	Fix ggml_init_params in quantize	Slaren

2023-03-29	Create chat-13B.bat (#592)	Thérence
	* Create chat-13B.bat Same script than chat-13B.sh, but for windows users. Tested and working on windows 10/11 v 22H2 * Apply suggestions from code review --------- Co-authored-by: anzz1 <anzz1@live.com>
2023-03-29	add example of re-act pattern (#583)	Tobias Lütke
	* add example of re-act pattern * spelling... * fixed whitespace in reverse prompt issue
2023-03-28	llama : fix linkage with mingw (#551)	anzz1
	* Revert 7e53955 (#542) Still needs to be fixed properly * Fix linking on mingw32
2023-03-28	all : be more strict about converting float to double (#458)	Stephan Walter
	* Be more strict about converting float to double * Test equivalence of round, SILU implementations Test module is commented out in CMakeLists.txt because the tests may take a long time, depending on how much the compiler optimizes. * Fix softmax in perplexity.cpp * all : prefer float over double where appropriate * perplexity : add <cmath> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28	ggml : introduce structs for the q4 data blocks (#356)	Stephan Walter
	* Introduce structs for the q4 data blocks * ggml : rename quant struct variables + fix ARM_NEON --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-28	main.cpp fixes, refactoring (#571)	anzz1
	- main: entering empty line passes back control without new input in interactive/instruct modes - instruct mode: keep prompt fix - instruct mode: duplicate instruct prompt fix - refactor: move common console code from main->common
2023-03-27	Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542)	Marco Matthies

2023-03-26	Update README and comments for standalone perplexity tool (#525)	Stephan Walter

2023-03-26	[main] fix infinite generation (-n == -1) (#523)	anzz1

2023-03-26	Exit from interactive mode if input stream is bad (#491)	Harald Fernengel
	Allow exiting the interactive prompt also with CTRL-D on Unix and CTRL-Z on Windows.
2023-03-25	(Windows) Set console to UTF-8 on init (#420)	anzz1
	Sets console codepage to 65001 (CP_UTF8) on start for both input and output, should fix problems with UTF-8 characters.
2023-03-25	Fix colors enabling on WIN32	Georgi Gerganov

2023-03-25	If n_predict == -1, generate forever	Georgi Gerganov

2023-03-25	Inifinite generation via context swapping (#71)	Georgi Gerganov

2023-03-25	Cleanup STL headers + fix embedding examples + minor stuff	Georgi Gerganov

2023-03-25	Move chat scripts into "./examples"	Georgi Gerganov

2023-03-25	Overhaul the examples structure	Georgi Gerganov
	- main -> examples - utils -> examples (renamed to "common") - quantize -> examples - separate tools for "perplexity" and "embedding" Hope I didn't break something !
2023-03-24	Immediately start processing the prompt before user input has been provided ↵	Georgi Gerganov
	(#476)
2023-03-21	fix typo in chatLLaMa (#368)	Mathieu Nayrolles
	The prompt contains a typo where 'alound' is used instead of 'aloud'.
2023-03-21	Add chatLLaMa script (#198)	Jean-Christophe Hoelt
	* Add chatLLaMa script * Fix shellcheck errors and do some cleanup * Move chatLLaMa script to `examples` directory * Reduce chatLLaMa context size to 2048 Ref d7def1a7524f712e5ebb7cd02bab0f13aa56a7f9 * Include n_predict to 2048 in examples/chatLLaMa