aboutsummaryrefslogtreecommitdiff
path: root/llama.cpp
AgeCommit message (Expand)Author
2023-04-29cuBLAS: use host pinned memory and dequantize while copying (#1207)slaren
2023-04-28Remove Q4_3 which is no better than Q5 (#1218)Stephan Walter
2023-04-28llama : add session file format and saved sessions in main (#1169)Evan Jones
2023-04-28ggml : add CLBlast support (#1164)0cc4m
2023-04-26ggml : add Q5_0 and Q5_1 quantization (#1187)Georgi Gerganov
2023-04-26Allow setting the rng seed after initialization. (#1184)Ásgeir Bjarni Ingvarsson
2023-04-25ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (...Georgi Gerganov
2023-04-24llama : increase scratch buffer size for 65B (ref #1152)Georgi Gerganov
2023-04-24llama : refactor get / set state + remove redundant kv cache API (#1143)Georgi Gerganov
2023-04-23ggml : better PERF prints + support "LLAMA_PERF=1 make"Georgi Gerganov
2023-04-22Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122)Stephan Walter
2023-04-22ggml : fix AVX build + update to new Q8_0 formatGeorgi Gerganov
2023-04-22llama : add api for getting/setting the complete state: rng, logits, embeddin...xaedes
2023-04-21llama : remember and restore kv cache data pointers (#1104)xaedes
2023-04-21llama : fix comment for "output.weight" tensorGeorgi Gerganov
2023-04-20ggml : sync ggml (add GPT-NeoX RoPE implementation)Georgi Gerganov
2023-04-20llama : multi-threaded quantization (#1075)Kawrakow
2023-04-20ggml : add Q4_3 quantization (#1082)Georgi Gerganov
2023-04-19Add NVIDIA cuBLAS support (#1044)slaren
2023-04-18ggml : add new Q4_2 quantization (ARM only) (#1046)Georgi Gerganov
2023-04-17Add LoRA support (#820)slaren
2023-04-17llama : well-defined static initialization of complex objects (#927)Arik Poznanski
2023-04-17Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933)Ivan Komarov
2023-04-16stdout : vertical align outputs for better readibilityGeorgi Gerganov
2023-04-16Fix msys2 build error and warnings (#1009)nanahi
2023-04-14Expose type name from ggml (#970)Pavol Rusnak
2023-04-13llama : merge llama_internal.h into llama.hGeorgi Gerganov
2023-04-12Don't crash on ftype (formerly f16) == 4 (#917)Stephan Walter
2023-04-11Add enum llama_ftype, sync ggml_type to model files (#709)Stephan Walter
2023-04-11Windows fixes (#890)comex
2023-04-10Print model version.comex
2023-04-10Rewrite loading code to try to satisfy everyone:comex
2023-04-08Add quantize-stats command for testing quantization (#728)unbounded
2023-04-07llama : always sort logits before nucleus sampling (#812)Ivan Stepanov
2023-04-05ggml, llama : avoid heavy V transpose + improvements (#775)Georgi Gerganov
2023-04-05llama : define non-positive top_k; top_k range check (#779)Ivan Stepanov
2023-04-03Define non-positive temperature behavior (#720)Ivan Stepanov
2023-04-02Added api for getting/setting the kv_cache (#685)Christian Falch
2023-04-02ggml : change ne to int64_t (#626)Marian Cepok
2023-04-02llama : do not allocate KV cache for "vocab_only == true" (#682)Stephan Walter
2023-03-30Introduce GGML migration tool for new file formatJustine Tunney
2023-03-30Ensure --mlock works properly with mmap() supportJustine Tunney
2023-03-30Make loading weights 10-100x fasterJustine Tunney
2023-03-30Initial windows support (untested)Slaren
2023-03-30Always initialize mm_addr and mm_length in llama_modelSlaren
2023-03-30Unmap the file in llama_freeSlaren
2023-03-30Make mmap_file staticSlaren
2023-03-30Fix ggml_init_params in quantizeSlaren
2023-03-30Add mmap support for model filesSlaren
2023-03-29llama : fix compile warnings when reading the vocabGeorgi Gerganov