aboutsummaryrefslogtreecommitdiff
path: root/llama.cpp
AgeCommit message (Expand)Author
2023-04-21llama : fix comment for "output.weight" tensorGeorgi Gerganov
2023-04-20ggml : sync ggml (add GPT-NeoX RoPE implementation)Georgi Gerganov
2023-04-20llama : multi-threaded quantization (#1075)Kawrakow
2023-04-20ggml : add Q4_3 quantization (#1082)Georgi Gerganov
2023-04-19Add NVIDIA cuBLAS support (#1044)slaren
2023-04-18ggml : add new Q4_2 quantization (ARM only) (#1046)Georgi Gerganov
2023-04-17Add LoRA support (#820)slaren
2023-04-17llama : well-defined static initialization of complex objects (#927)Arik Poznanski
2023-04-17Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933)Ivan Komarov
2023-04-16stdout : vertical align outputs for better readibilityGeorgi Gerganov
2023-04-16Fix msys2 build error and warnings (#1009)nanahi
2023-04-14Expose type name from ggml (#970)Pavol Rusnak
2023-04-13llama : merge llama_internal.h into llama.hGeorgi Gerganov
2023-04-12Don't crash on ftype (formerly f16) == 4 (#917)Stephan Walter
2023-04-11Add enum llama_ftype, sync ggml_type to model files (#709)Stephan Walter
2023-04-11Windows fixes (#890)comex
2023-04-10Print model version.comex
2023-04-10Rewrite loading code to try to satisfy everyone:comex
2023-04-08Add quantize-stats command for testing quantization (#728)unbounded
2023-04-07llama : always sort logits before nucleus sampling (#812)Ivan Stepanov
2023-04-05ggml, llama : avoid heavy V transpose + improvements (#775)Georgi Gerganov
2023-04-05llama : define non-positive top_k; top_k range check (#779)Ivan Stepanov
2023-04-03Define non-positive temperature behavior (#720)Ivan Stepanov
2023-04-02Added api for getting/setting the kv_cache (#685)Christian Falch
2023-04-02ggml : change ne to int64_t (#626)Marian Cepok
2023-04-02llama : do not allocate KV cache for "vocab_only == true" (#682)Stephan Walter
2023-03-30Introduce GGML migration tool for new file formatJustine Tunney
2023-03-30Ensure --mlock works properly with mmap() supportJustine Tunney
2023-03-30Make loading weights 10-100x fasterJustine Tunney
2023-03-30Initial windows support (untested)Slaren
2023-03-30Always initialize mm_addr and mm_length in llama_modelSlaren
2023-03-30Unmap the file in llama_freeSlaren
2023-03-30Make mmap_file staticSlaren
2023-03-30Fix ggml_init_params in quantizeSlaren
2023-03-30Add mmap support for model filesSlaren
2023-03-29llama : fix compile warnings when reading the vocabGeorgi Gerganov
2023-03-29llama : use the same threshold for OpenBLAS and ggml thread limiting (#577)Maƫl Kerbiriou
2023-03-28py : add temporary script to convert old ggml files to newer version (#539)thement
2023-03-28all : be more strict about converting float to double (#458)Stephan Walter
2023-03-28ggml : introduce structs for the q4 data blocks (#356)Stephan Walter
2023-03-25Cleanup STL headers + fix embedding examples + minor stuffGeorgi Gerganov
2023-03-25Don't interefe with BLAS for large prompts by running only 1 threadGeorgi Gerganov
2023-03-25Add timings for the prompt evaluation (#478)slaren
2023-03-25Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLASGeorgi Gerganov
2023-03-25Add support for file load progress reporting callbacks (#434)Jed Fox
2023-03-25Fix crash for 65B model with pre-allocated memory (#485)Chris Kuehl
2023-03-24Reduce memory usage and allocate enough memory for largest context (#473)Georgi Gerganov
2023-03-24Temporary bump the memory buffer size - hopefully fix issues from 483bab2eGeorgi Gerganov
2023-03-24Properly free llama_context on failureGeorgi Gerganov
2023-03-24Support calling mlock() on loaded model data on Linux and macOS (#453)comex