Age | Commit message (Expand) | Author |
2023-04-24 | llama : increase scratch buffer size for 65B (ref #1152) | Georgi Gerganov |
2023-04-24 | examples/main README improvements and some light refactoring (#1131) | mgroeber9110 |
2023-04-24 | Fix build for gcc 8 and test in CI (#1154) | Stephan Walter |
2023-04-24 | Fix cuda compilation (#1128) | slaren |
2023-04-24 | llama : refactor get / set state + remove redundant kv cache API (#1143) | Georgi Gerganov |
2023-04-23 | Fix LoRA acronym (#1145) | slaren |
2023-04-23 | scripts : add helper scripts to synch ggml repo | Georgi Gerganov |
2023-04-23 | Added README.md for main with examples and explanations (#1139) | DannyDaemonic |
2023-04-23 | ggml : do not print perf ops that have not been used at all | Georgi Gerganov |
2023-04-23 | ggml : better PERF prints + support "LLAMA_PERF=1 make" | Georgi Gerganov |
2023-04-23 | Improve AVX2 for vec_dot_q4_3_q8_0 (#1138) | Stephan Walter |
2023-04-23 | readme : update gpt4all instructions (#980) | Pavol Rusnak |
2023-04-23 | A better `packNibbles` and `mul_sum_i8_pairs_float` implementation using AVX5... | Yishuo Wang |
2023-04-22 | ggml : fix Q4_3 cuBLAS | Georgi Gerganov |
2023-04-22 | ci : trigger CI for drafts, but not most PR actions (#1125) | Stephan Walter |
2023-04-22 | Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122) | Stephan Walter |
2023-04-22 | ggml : unit test for quantization functions (#953) | unbounded |
2023-04-22 | llama : print timings on ctrl+c exit (#1021) | wbpxre150 |
2023-04-22 | llama : have n_batch default to 512 (#1091) | eiery |
2023-04-22 | cmake : fix build under Windows when enable BUILD_SHARED_LIBS (#1100) | Howard Su |
2023-04-22 | ggml : fix AVX build + update to new Q8_0 format | Georgi Gerganov |
2023-04-22 | ggml : alternative Q4_3 implementation using modified Q8_0 (#1109) | Georgi Gerganov |
2023-04-22 | ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring (#1099) | Stephan Walter |
2023-04-22 | examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Exp... | Clint Herron |
2023-04-22 | llama : add api for getting/setting the complete state: rng, logits, embeddin... | xaedes |
2023-04-21 | Improve cuBLAS performance by using a memory pool (#1094) | slaren |
2023-04-21 | llama : fixed rlimit error message (#888) | apaz |
2023-04-21 | cmake : link threads publicly to ggml (#1042) | 源文雨 |
2023-04-21 | main : evaluate tokens in batches after swapping context (#1014) | Alex Klinkhamer |
2023-04-21 | llama : remember and restore kv cache data pointers (#1104) | xaedes |
2023-04-21 | ggml : a faster version for Q4_1 x Q8_0 dot products (#1083) | Kawrakow |
2023-04-21 | Show perplexity ETA in hours and minutes (#1096) | slaren |
2023-04-21 | llama : fix comment for "output.weight" tensor | Georgi Gerganov |
2023-04-20 | Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088) | Stephan Walter |
2023-04-20 | ggml : sync ggml (add GPT-NeoX RoPE implementation) | Georgi Gerganov |
2023-04-20 | ggml : fix bug in ggml_compute_forward_dup_f32() | Georgi Gerganov |
2023-04-20 | Add Q4_3 support to cuBLAS (#1086) | slaren |
2023-04-20 | ggml : do not break cuBLAS build (Q4_3 is not yet implemented) | Georgi Gerganov |
2023-04-20 | ggml : fix Q4_3 quantization | Georgi Gerganov |
2023-04-20 | llama : multi-threaded quantization (#1075) | Kawrakow |
2023-04-20 | ggml : add Q4_3 quantization (#1082) | Georgi Gerganov |
2023-04-20 | ci : remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the C... | Ivan Komarov |
2023-04-20 | fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080) | 源文雨 |
2023-04-20 | AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) | Stephan Walter |
2023-04-20 | Improve cuBLAS performance by dequantizing on the GPU (#1065) | slaren |
2023-04-19 | Minor: Readme fixed grammar, spelling, and misc updates (#1071) | CRD716 |
2023-04-19 | Q4_2 quantization with rmse-optimized scale and quants (#1062) | Kawrakow |
2023-04-19 | ggml : use 8-bit precision for Q4_1 intermediate results (#1047) | Georgi Gerganov |
2023-04-19 | readme : add warning about Q4_2 and Q4_3 | Georgi Gerganov |
2023-04-19 | ggml : Q4 cleanup - remove 4-bit dot product code (#1061) | Stephan Walter |