Age | Commit message (Expand) | Author |
2023-06-19 | ggml : sync latest ggml repo (#1924) | Georgi Gerganov |
2023-06-18 | metal : handle buffers larger than device's maxBufferLength (#1826) | Georgi Gerganov |
2023-06-14 | CUDA full GPU acceleration, KV cache in VRAM (#1827) | Johannes Gäßler |
2023-06-13 | train : improved training-from-scratch example (#1652) | xaedes |
2023-06-06 | Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) | Johannes Gäßler |
2023-06-05 | ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) | Kawrakow |
2023-06-04 | llama : Metal inference (#1642) | Georgi Gerganov |
2023-05-29 | ggml : sync cgraph import / export API | Georgi Gerganov |
2023-05-27 | ggml : add ggml_tensor_overhead() | Georgi Gerganov |
2023-05-27 | ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name()) | Georgi Gerganov |
2023-05-23 | OpenCL Token Generation Acceleration (#1459) | 0cc4m |
2023-05-20 | ggml : add ggml_clamp() (#1539) | Georgi Gerganov |
2023-05-19 | ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508) | Georgi Gerganov |
2023-05-14 | ggml : various fixes (#1450) | Georgi Gerganov |
2023-05-14 | ggml : add GGML_QNT_VERSION to track quantization format changes | Georgi Gerganov |
2023-05-13 | ggml : GPU-accelerated token generation (#1412) | Johannes Gäßler |
2023-05-13 | ggml : implement backward pass for llama + small training-llama-from-scratch ... | xaedes |
2023-05-12 | ggml : remove bit shuffling (#1405) | Georgi Gerganov |
2023-05-02 | ggml: add names to tensors (#1268) | slaren |
2023-05-01 | cuBLAS: refactor and optimize f16 mat mul performance (#1259) | slaren |
2023-04-30 | ggml : add Q5 WASM SIMD + GGML_FTYPE | Georgi Gerganov |
2023-04-29 | ggml : fix visibility and unused warnings | Georgi Gerganov |
2023-04-28 | Remove Q4_3 which is no better than Q5 (#1218) | Stephan Walter |
2023-04-28 | ggml : sync ggml (ggml_alibi) | Georgi Gerganov |
2023-04-28 | ggml : add CLBlast support (#1164) | 0cc4m |
2023-04-26 | ggml : add Q5_0 and Q5_1 quantization (#1187) | Georgi Gerganov |
2023-04-25 | ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (... | Georgi Gerganov |
2023-04-24 | ggml : export symbols (#1155) | Georgi Gerganov |
2023-04-20 | ggml : sync ggml (add GPT-NeoX RoPE implementation) | Georgi Gerganov |
2023-04-20 | llama : multi-threaded quantization (#1075) | Kawrakow |
2023-04-20 | ggml : add Q4_3 quantization (#1082) | Georgi Gerganov |
2023-04-19 | Add NVIDIA cuBLAS support (#1044) | slaren |
2023-04-18 | ggml : add new Q4_2 quantization (ARM only) (#1046) | Georgi Gerganov |
2023-04-17 | Add LoRA support (#820) | slaren |
2023-04-17 | Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933) | Ivan Komarov |
2023-04-15 | ggml : add Q8_0 quantization for intermediate results (#951) | Georgi Gerganov |
2023-04-14 | Expose type name from ggml (#970) | Pavol Rusnak |
2023-04-14 | ggml : add unary and binary map operations (#874) | Kerfuffle |
2023-04-13 | ggml : add GGML_DEFAULT_N_THREADS | Georgi Gerganov |
2023-04-11 | Add enum llama_ftype, sync ggml_type to model files (#709) | Stephan Walter |
2023-04-10 | ggml : add ggml_cont() + optimize ggml_cpy() for contiguous dst | Georgi Gerganov |
2023-04-10 | Rewrite loading code to try to satisfy everyone: | comex |
2023-04-08 | Add quantize-stats command for testing quantization (#728) | unbounded |
2023-04-05 | ggml, llama : avoid heavy V transpose + improvements (#775) | Georgi Gerganov |
2023-04-02 | ggml : change ne to int64_t (#626) | Marian Cepok |
2023-03-30 | Ensure --mlock works properly with mmap() support | Justine Tunney |
2023-03-30 | Add mmap support for model files | Slaren |
2023-03-28 | ggml : introduce structs for the q4 data blocks (#356) | Stephan Walter |
2023-03-24 | Support calling mlock() on loaded model data on Linux and macOS (#453) | comex |
2023-03-22 | Deduplicate q4 quantization functions (#383) | Stephan Walter |