llama.cpp.git - llama.cpp

Age	Commit message (Collapse)	Author
2023-08-08	Allow passing grammar to completion endpoint (#2532)	Martin Krasser
	* Allow passing grammar to completion endpoint
2023-08-08	llm.vim : multiline autocompletion, get rid of "^@" (#2543)	chaihahaha

2023-08-08	vim : bring back simple llm.vim example	Georgi Gerganov

2023-08-08	vim : streaming and more (#2495)	AustinMroz
	* Update Vim plugin * Remove getbufoneline usage, Add input bind example. getbufoneline() appears to be a recently added function and has been replaced with getbufline for compatibility. An additional example that explains how to add a keybind that works in insert mode was added.
2023-08-07	Add --rope-scale parameter (#2544)	klosax
	* common.cpp : Add --rope-scale parameter * README.md : Add info about using linear rope scaling
2023-08-06	console : fix issue related to Windows 11 PowerShell console mode ↵	DannyDaemonic
	persistence (#2521)
2023-08-04	fix firefox autoscroll (#2519)	Jonas Wunderlich

2023-08-04	server: regenerate completion.js.hpp (#2515)	Cebtenzzre

2023-08-04	Add --simple-io option for subprocesses and break out console.h and cpp (#1558)	DannyDaemonic

2023-08-04	Fixing race condition in server and partial stream handling in frontend. (#2391)	Stephen Nichols
	* Fixing race condition in server.cpp and partial stream handling in completion.js * Reverting assert edits. * Adding newline to eof
2023-08-04	build : fix several cast and printf warnings (#2499)	Borislav Stanimirov

2023-08-02	examples : generate JSON according to schema (#1887)	Evan Jones
	* examples : add JSON schema grammars * complete JSON grammar * ensure primitive types can be used as root of schema * support integer type and adjust usage text
2023-08-02	tests : Fix compilation warnings (Linux/GCC) (#2451)	Eve
	* fix hellaswag print format, cast away warning in test-double-float * c++11 cannot use designated initializers * add static to test-grad0.c internal functions * use memcpy in test-double-float.c * port c tests to c++ * use initializer list for ggml_init_params
2023-08-01	fix a typo in examples/server/README.md (#2478)	Bono Lv

2023-08-01	server : Support dark mode (#2414)	ebraminio
	* server : Support dark mode So it respects user system light / dark settings. * Update index.html.hpp by running ./deps.sh
2023-07-31	CUDA: mmq CLI option, fixed mmq build issues (#2453)	Johannes Gäßler

2023-07-28	perplexity : add Hellaswag calculation (#2389)	klosax
	* common.h : add hellaswag / remove perplexity-lines * common.cpp : add hellaswag / remove perplexity-lines * perplexity.cpp : add hellswag scores / remove perplexity-lines * perplexity.cpp : clean up * common.h : change default param value * common.cpp : Change default param * perplexity.cpp : alter wording * common.h : alter wording * common.cpp : alter wording
2023-07-28	examples : fix whitespace	Georgi Gerganov

2023-07-28	examples : server chat mode with llama2 (#2400)	nhamanasu
	* add: server chat mode with llama2 * fix: remove the unnecessary last \n
2023-07-28	readme : fix the description of the Tail free sampling (TFS) method (#2431)	Weird Constructor

2023-07-28	llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433)	Rand Xie

2023-07-25	Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384)	Kawrakow
	Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-25	main : add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS (#2304)	Xiao-Yong Jin
	* add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS The BOS precedes the string specified by `--in-prefix`. Model generated EOS is now kept in the context. It provides a way to strictly following the prompt format used in Llama-2-chat. The EOS handling also benefits some existing finetunes that uses EOS to mark the end of turn. * examples/common: move input_prefix_bos to other bools
2023-07-25	server: add rms_norm_eps parameter (#2380)	slaren

2023-07-25	[Server] Escape HTML in webchat (#2368)	Henri Vasserman
	* escape HTML in webchat * add amp
2023-07-24	make rms_norm_eps a parameter (#2374)	slaren
	* make rms_norm_eps a parameter * add rms_norm_eps to command line * fix baby llama, test-grad0 * use scientific notation for eps param in the help ggml-ci
2023-07-24	Chat UI extras (#2366)	Aarni Koskela
	* makefile: correct deps for server * server: tighten settings layout a little * server: expose all currently configured generation params in UI * server: expose remaining generation params, for the adventurous * server: embetter mirostat fields
2023-07-23	llama : add grammar-based sampling (#1773)	Evan Jones
	* llama, main : constrain sampling to grammar * allow loading grammar from file * fix whitespace errors * handle & print parser errors * add comments to grammar syntax and allow newlines where unambiguous * add missing include * support alternates in root rule * fix bugs with empty token and EOS * adjust JSON grammar * remove swp file * rewrite ternary expressions Co-authored-by: Henri Vasserman <henv@hot.ee> * use struct for grammar elements and add Unicode support * add unicode escapes * add inverse char ranges * only sample full tokens (no peeking or truncation) * llama : minor style changes blindly applied in online editor - hopefully I didn't break something * update help text * add warning message if EOS is disabled --------- Co-authored-by: Henri Vasserman <henv@hot.ee> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-23	Add gqa parameter support to the server (#2351)	IgnacioFDM
	* Add gqa parameter support to the server * Change help from stderr to stdout
2023-07-23	common : n_threads == -1 uses std::thread::hardware_concurrency() (#2347)	wzy
	* Fix #2345, fix incorrect n_threads * Update examples/common.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-23	llama : grouped-query attention + LLaMAv2 70B support (#2276)	Georgi Gerganov
	* CUDA: GQA implementation * llama : support for GQA and LLaMAv2 70B ggml-ci * py : fix hparams parsing (if-else blocks) ggml-ci * py : oh boy .. ggml-ci * help : fix gqa value for 70B ggml-ci --------- Co-authored-by: JohannesGaessler <johannesg@5d6.de>
2023-07-23	llama : print help to stdout (#2338)	maddes8cht

2023-07-23	examples : simplify vim plugin (#2327)	AustinMroz
	Uses builtin json_encode and json_decode functions to simplify escaping Removes the need for temp files
2023-07-22	llama : optimize memory buffers (#2325)	Georgi Gerganov

2023-07-22	Perplexity: Compute scores correlated to HellaSwag (#2312)	klosax
	* Add parameter --perplexity-lines to perplexity.cpp
2023-07-22	examples : basic VIM plugin	whoreson
	VIM plugin for server exe
2023-07-21	examples : add easy python script to create quantized (k-bit support) GGML ↵	Richard Roberson
	models from local HF Transformer models (#2311) * Resync my fork with new llama.cpp commits * examples : rename to use dash instead of underscore --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-21	examples : fix typo in minigpt4.py (#2298)	Ikko Eltociear Ashimine
	promt -> prompt
2023-07-21	ggml : fix rope args order + assert (#2054)	Georgi Gerganov

2023-07-21	llama : remove cfg smooth factor as it is only a reparameterization of the ↵	Guillaume "Vermeille" Sanchez
	guidance scale (#2280)
2023-07-21	gitignore : changes for Poetry users + chat examples (#2284)	Jose Maldonado
	A fix in Makefile for FreeBSD users. In the platfrom x86_64 is amd64. This fix resolve compilation using CFLAGS and CXXFLAGS with -march=native and -mtune=native Add two examples for interactive mode using Llama2 models (thx TheBloke for models) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-21	llama : make tensor_split ptr instead of array (#2272)	Georgi Gerganov

2023-07-21	MIKU MAYHEM: Upgrading the Default Model for Maximum Fun 🎉 (#2287)	Hatsune Miku
	* Miku.sh: Set default model to llama-2-7b-chat * Miku.sh: Set ctx_size to 4096 * Miku.sh: Add in-prefix/in-suffix opts * Miku.sh: Switch sampler to mirostat_v2 and tiny prompt improvements
2023-07-21	make : fix embdinput library and server examples building on MSYS2 (#2235)	Przemysław Pawełczyk
	* make : fix embdinput library and server examples building on MSYS2 * cmake : fix server example building on MSYS2
2023-07-19	cmake : install targets (#2256)	wzy
	fix #2252
2023-07-18	ci : integrate with ggml-org/ci (#2250)	Georgi Gerganov
	* ci : run ctest ggml-ci * ci : add open llama 3B-v2 tests ggml-ci * ci : disable wget progress output ggml-ci * ci : add open llama 3B-v2 tg tests for q4 and q5 quantizations ggml-ci * tests : try to fix tail free sampling test ggml-ci * ci : add K-quants ggml-ci * ci : add short perplexity tests ggml-ci * ci : add README.md * ppl : add --chunks argument to limit max number of chunks ggml-ci * ci : update README
2023-07-18	llama : shorten quantization descriptions	Georgi Gerganov

2023-07-15	llama : add custom RoPE (#2054)	Xiao-Yong Jin
	* Implement customizable RoPE The original RoPE has pre-defined parameters theta_i = 10000^(−2(i−1)/d), for i in [1, 2, ..., d/2] Our customizable RoPE, ggml_rope_custom_inplace, uses theta_i = scale * base^(−2(i−1)/d), for i in [1, 2, ..., d/2] with the default matches the original scale = 1.0 base = 10000 The new command line arguments --rope-freq-base --rope-freq-scale set the two new RoPE parameter. Recent researches show changing these two parameters extends the context limit with minimal loss. 1. Extending Context to 8K kaiokendev https://kaiokendev.github.io/til#extending-context-to-8k 2. Extending Context Window of Large Language Models via Positional Interpolation Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian https://arxiv.org/abs/2306.15595 3. NTK-Aware Scaled RoPE allows LLaMA models to have extended (8k+) context size without any fine-tuning and minimal perplexity degradation. https://www.reddit.com/user/bloc97 https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/ For the bold, try adding the following command line parameters to your favorite model: -c 16384 --rope-freq-base 80000 --rope-freq-scale 0.5 * ggml-metal: fix custom rope * common: fix argument names in help * llama: increase MEM_REQ_EVAL for MODEL_3B It avoids crashing for quantized weights on CPU. Better ways to calculate the required buffer size would be better. * llama: make MEM_REQ_EVAL depend on n_ctx * server: use proper Content-Type in curl examples Without the header Content-Type: application/json, curl will POST with Content-Type: application/x-www-form-urlencoded Though our simple server doesn't care, the httplib.h used has a limit with CPPHTTPLIB_FORM_URL_ENCODED_PAYLOAD_MAX_LENGTH 8192 With Content-Type: application/json, we can send large json data. * style : minor fixes, mostly indentations * ggml : fix asserts --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-14	examples : fixed path typos in embd-input (#2214)	Shangning Xu

2023-07-13	Revert "Support using mmap when applying LoRA (#2095)" (#2206)	Howard Su
	Has perf regression when mlock is used. This reverts commit 2347463201a9f4159ae95b737e1544dd300569c8.