llama.cpp.git - llama.cpp

Age	Commit message (Collapse)	Author
2023-07-28	readme : fix the description of the Tail free sampling (TFS) method (#2431)	Weird Constructor

2023-07-13	Revert "Support using mmap when applying LoRA (#2095)" (#2206)	Howard Su
	Has perf regression when mlock is used. This reverts commit 2347463201a9f4159ae95b737e1544dd300569c8.
2023-07-11	Support using mmap when applying LoRA (#2095)	Howard Su
	* Support using mmap when applying LoRA * Fix Linux * Update comment to reflect the support lora with mmap
2023-06-29	Use unsigned for random seed (#2006)	Howard Su
	* Use unsigned for random seed. Keep -1 as the value to use a time based seed. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-26	ggml : add NUMA support (#1556)	zrm
	* detect NUMA systems and pin work threads to nodes (linux) * disable mmap prefetch/readahead for NUMA systems * avoid sending finalize op to thread pool if it does nothing * silence robot * fix args * make --numa a param * recommendation that n_nodes evenly divide n_threads did not warrant such aggressive enforcement * lower synchronization overhead * statically allocate * move numa state to g_state * add description for --numa * ggml : minor style changes * ggml : minor style + try fix sanitizer build * llama : allow to initialize backend with NUMA support * llama : avoid ggml include in llama-util.h * ggml : style / formatting * ggml : fix handling of ops with n_threads > n_tasks > 1 * server : utilize numa parameter --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-14	CUDA full GPU acceleration, KV cache in VRAM (#1827)	Johannes Gäßler
	* Fixed CUDA RoPE * ggml_cuda_mul_mat_vec_p021 * ggml_cuda_scale * ggml_cuda_diag_mask_inf * ggml_is_permuted * ggml_cuda_cpy * flatten rows for ggml_cuda_op * Added a --low-vram option * Fixed Windows performance * Fixed LLAMA_CUDA_DMMV_Y > 1 for WizardLM
2023-06-06	Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703)	Johannes Gäßler
	* CUDA multi GPU + scratch ggml_cuda_compute_forward Tensor parallelism ggml_cuda_add ggml_cuda_rms_norm ggml_cuda_silu CUDA scratch buffer --main-gpu CLI option
2023-05-28	Only show -ngl option when relevant + other doc/arg handling updates (#1625)	Kerfuffle
	1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356
2023-05-25	Some improvements to loading the session with --prompt-cache (#1550)	Kerfuffle
	Improvements to loading the session with `--prompt-cache` in the `main` example. 1. Fix an issue where the `--seed` parameter was ignored when loading a cached prompt. 2. When loading a cached prompt, you previously had to specify the saved prompt (or a prefix of it) again. This pull changes that behavior to default to the prompt that was cached if a prompt wasn't specified by the user.
2023-05-10	main : add option to save full output to session (#1338)	Evan Jones
	* main : add option to save full output to session * split behavior into --session and --prompt-cache * restore original implementation with new names * PR comments * move the check for incompatible parameters to gpt_params_parse * Fix whitespace Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com> --------- Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
2023-05-04	main : add --in-suffix option (#1318)	44670
	* adding --in-suffix option * print input suffix before generation
2023-05-04	Only escape prompts when used with `-e` (#1311)	DannyDaemonic

2023-05-04	Update main's README.md with new features (#1296)	DannyDaemonic

2023-05-02	llama : allow 0 as a seed number. (#1275)	Robert Brisita

2023-04-24	examples/main README improvements and some light refactoring (#1131)	mgroeber9110

2023-04-23	Fix LoRA acronym (#1145)	slaren

2023-04-23	Added README.md for main with examples and explanations (#1139)	DannyDaemonic

2023-04-11	Fix whitespace, add .editorconfig, add GitHub workflow (#883)	Pavol Rusnak

2023-03-25	Overhaul the examples structure	Georgi Gerganov
	- main -> examples - utils -> examples (renamed to "common") - quantize -> examples - separate tools for "perplexity" and "embedding" Hope I didn't break something !