Age | Commit message (Expand) | Author |
2023-07-11 | ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183) | Georgi Gerganov |
2023-07-11 | ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) | Spencer Sutton |
2023-07-07 | ggml : change ggml_graph_compute() API to not require context (#1999) | Qingyou Meng |
2023-07-06 | ggml : fix restrict usage | Georgi Gerganov |
2023-07-05 | ggml : generalize `quantize_fns` for simpler FP16 handling (#1237) | Stephan Walter |
2023-07-04 | ggml : sync latest (new ops, macros, refactoring) (#2106) | Georgi Gerganov |
2023-07-01 | ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995) | Qingyou Meng |
2023-06-27 | ggml : add support for ChatGLM RoPE | Georgi Gerganov |
2023-06-26 | ggml : increase max tensor name + clean up compiler warnings in train-text (#... | David Yang |
2023-06-26 | ggml : add NUMA support (#1556) | zrm |
2023-06-25 | ggml : sync latest ggml (custom operators) | Georgi Gerganov |
2023-06-24 | ggml : improve ggml_graph_dump_dot, add ggml_format_name (#1978) | slaren |
2023-06-19 | ggml : sync latest ggml repo (#1924) | Georgi Gerganov |
2023-06-18 | metal : handle buffers larger than device's maxBufferLength (#1826) | Georgi Gerganov |
2023-06-14 | CUDA full GPU acceleration, KV cache in VRAM (#1827) | Johannes Gäßler |
2023-06-13 | train : improved training-from-scratch example (#1652) | xaedes |
2023-06-06 | Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) | Johannes Gäßler |
2023-06-05 | ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) | Kawrakow |
2023-06-04 | llama : Metal inference (#1642) | Georgi Gerganov |
2023-05-29 | ggml : sync cgraph import / export API | Georgi Gerganov |
2023-05-27 | ggml : add ggml_tensor_overhead() | Georgi Gerganov |
2023-05-27 | ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name()) | Georgi Gerganov |
2023-05-23 | OpenCL Token Generation Acceleration (#1459) | 0cc4m |
2023-05-20 | ggml : add ggml_clamp() (#1539) | Georgi Gerganov |
2023-05-19 | ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508) | Georgi Gerganov |
2023-05-14 | ggml : various fixes (#1450) | Georgi Gerganov |
2023-05-14 | ggml : add GGML_QNT_VERSION to track quantization format changes | Georgi Gerganov |
2023-05-13 | ggml : GPU-accelerated token generation (#1412) | Johannes Gäßler |
2023-05-13 | ggml : implement backward pass for llama + small training-llama-from-scratch ... | xaedes |
2023-05-12 | ggml : remove bit shuffling (#1405) | Georgi Gerganov |
2023-05-02 | ggml: add names to tensors (#1268) | slaren |
2023-05-01 | cuBLAS: refactor and optimize f16 mat mul performance (#1259) | slaren |
2023-04-30 | ggml : add Q5 WASM SIMD + GGML_FTYPE | Georgi Gerganov |
2023-04-29 | ggml : fix visibility and unused warnings | Georgi Gerganov |
2023-04-28 | Remove Q4_3 which is no better than Q5 (#1218) | Stephan Walter |
2023-04-28 | ggml : sync ggml (ggml_alibi) | Georgi Gerganov |
2023-04-28 | ggml : add CLBlast support (#1164) | 0cc4m |
2023-04-26 | ggml : add Q5_0 and Q5_1 quantization (#1187) | Georgi Gerganov |
2023-04-25 | ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (... | Georgi Gerganov |
2023-04-24 | ggml : export symbols (#1155) | Georgi Gerganov |
2023-04-20 | ggml : sync ggml (add GPT-NeoX RoPE implementation) | Georgi Gerganov |
2023-04-20 | llama : multi-threaded quantization (#1075) | Kawrakow |
2023-04-20 | ggml : add Q4_3 quantization (#1082) | Georgi Gerganov |
2023-04-19 | Add NVIDIA cuBLAS support (#1044) | slaren |
2023-04-18 | ggml : add new Q4_2 quantization (ARM only) (#1046) | Georgi Gerganov |
2023-04-17 | Add LoRA support (#820) | slaren |
2023-04-17 | Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933) | Ivan Komarov |
2023-04-15 | ggml : add Q8_0 quantization for intermediate results (#951) | Georgi Gerganov |
2023-04-14 | Expose type name from ggml (#970) | Pavol Rusnak |
2023-04-14 | ggml : add unary and binary map operations (#874) | Kerfuffle |