aboutsummaryrefslogtreecommitdiff
path: root/llama.cpp
AgeCommit message (Expand)Author
2023-04-10Print model version.comex
2023-04-10Rewrite loading code to try to satisfy everyone:comex
2023-04-08Add quantize-stats command for testing quantization (#728)unbounded
2023-04-07llama : always sort logits before nucleus sampling (#812)Ivan Stepanov
2023-04-05ggml, llama : avoid heavy V transpose + improvements (#775)Georgi Gerganov
2023-04-05llama : define non-positive top_k; top_k range check (#779)Ivan Stepanov
2023-04-03Define non-positive temperature behavior (#720)Ivan Stepanov
2023-04-02Added api for getting/setting the kv_cache (#685)Christian Falch
2023-04-02ggml : change ne to int64_t (#626)Marian Cepok
2023-04-02llama : do not allocate KV cache for "vocab_only == true" (#682)Stephan Walter
2023-03-30Introduce GGML migration tool for new file formatJustine Tunney
2023-03-30Ensure --mlock works properly with mmap() supportJustine Tunney
2023-03-30Make loading weights 10-100x fasterJustine Tunney
2023-03-30Initial windows support (untested)Slaren
2023-03-30Always initialize mm_addr and mm_length in llama_modelSlaren
2023-03-30Unmap the file in llama_freeSlaren
2023-03-30Make mmap_file staticSlaren
2023-03-30Fix ggml_init_params in quantizeSlaren
2023-03-30Add mmap support for model filesSlaren
2023-03-29llama : fix compile warnings when reading the vocabGeorgi Gerganov
2023-03-29llama : use the same threshold for OpenBLAS and ggml thread limiting (#577)Maël Kerbiriou
2023-03-28py : add temporary script to convert old ggml files to newer version (#539)thement
2023-03-28all : be more strict about converting float to double (#458)Stephan Walter
2023-03-28ggml : introduce structs for the q4 data blocks (#356)Stephan Walter
2023-03-25Cleanup STL headers + fix embedding examples + minor stuffGeorgi Gerganov
2023-03-25Don't interefe with BLAS for large prompts by running only 1 threadGeorgi Gerganov
2023-03-25Add timings for the prompt evaluation (#478)slaren
2023-03-25Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLASGeorgi Gerganov
2023-03-25Add support for file load progress reporting callbacks (#434)Jed Fox
2023-03-25Fix crash for 65B model with pre-allocated memory (#485)Chris Kuehl
2023-03-24Reduce memory usage and allocate enough memory for largest context (#473)Georgi Gerganov
2023-03-24Temporary bump the memory buffer size - hopefully fix issues from 483bab2eGeorgi Gerganov
2023-03-24Properly free llama_context on failureGeorgi Gerganov
2023-03-24Support calling mlock() on loaded model data on Linux and macOS (#453)comex
2023-03-24Add embedding mode with arg flag. Currently working (#282)Luciano
2023-03-24Revert "Fix memory allocation issues and seg faults"Georgi Gerganov
2023-03-24Fix memory allocation issues and seg faultsGeorgi Gerganov
2023-03-23Avoid the transposed X branch in the Z = X * Y matrix multiplication (#439)Georgi Gerganov
2023-03-22Add missing header for memcpy (#386)Yusuf Kağan Hanoğlu
2023-03-22Init llama_context_params properly from CLI (#370)Georgi Gerganov
2023-03-22Introduce C-style API (#370)Georgi Gerganov