aboutsummaryrefslogtreecommitdiff
path: root/ggml.h
AgeCommit message (Expand)Author
2023-06-25ggml : sync latest ggml (custom operators)Georgi Gerganov
2023-06-24ggml : improve ggml_graph_dump_dot, add ggml_format_name (#1978)slaren
2023-06-19ggml : sync latest ggml repo (#1924)Georgi Gerganov
2023-06-18metal : handle buffers larger than device's maxBufferLength (#1826)Georgi Gerganov
2023-06-14CUDA full GPU acceleration, KV cache in VRAM (#1827)Johannes Gäßler
2023-06-13train : improved training-from-scratch example (#1652)xaedes
2023-06-06Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703)Johannes Gäßler
2023-06-05ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684)Kawrakow
2023-06-04llama : Metal inference (#1642)Georgi Gerganov
2023-05-29ggml : sync cgraph import / export APIGeorgi Gerganov
2023-05-27ggml : add ggml_tensor_overhead()Georgi Gerganov
2023-05-27ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name())Georgi Gerganov
2023-05-23OpenCL Token Generation Acceleration (#1459)0cc4m
2023-05-20ggml : add ggml_clamp() (#1539)Georgi Gerganov
2023-05-19ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)Georgi Gerganov
2023-05-14ggml : various fixes (#1450)Georgi Gerganov
2023-05-14ggml : add GGML_QNT_VERSION to track quantization format changesGeorgi Gerganov
2023-05-13ggml : GPU-accelerated token generation (#1412)Johannes Gäßler
2023-05-13ggml : implement backward pass for llama + small training-llama-from-scratch ...xaedes
2023-05-12ggml : remove bit shuffling (#1405)Georgi Gerganov
2023-05-02ggml: add names to tensors (#1268)slaren
2023-05-01cuBLAS: refactor and optimize f16 mat mul performance (#1259)slaren
2023-04-30ggml : add Q5 WASM SIMD + GGML_FTYPEGeorgi Gerganov
2023-04-29ggml : fix visibility and unused warningsGeorgi Gerganov
2023-04-28Remove Q4_3 which is no better than Q5 (#1218)Stephan Walter
2023-04-28ggml : sync ggml (ggml_alibi)Georgi Gerganov
2023-04-28ggml : add CLBlast support (#1164)0cc4m
2023-04-26ggml : add Q5_0 and Q5_1 quantization (#1187)Georgi Gerganov
2023-04-25ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (...Georgi Gerganov
2023-04-24ggml : export symbols (#1155)Georgi Gerganov
2023-04-20ggml : sync ggml (add GPT-NeoX RoPE implementation)Georgi Gerganov
2023-04-20llama : multi-threaded quantization (#1075)Kawrakow
2023-04-20ggml : add Q4_3 quantization (#1082)Georgi Gerganov
2023-04-19Add NVIDIA cuBLAS support (#1044)slaren
2023-04-18ggml : add new Q4_2 quantization (ARM only) (#1046)Georgi Gerganov
2023-04-17Add LoRA support (#820)slaren
2023-04-17Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933)Ivan Komarov
2023-04-15ggml : add Q8_0 quantization for intermediate results (#951)Georgi Gerganov
2023-04-14Expose type name from ggml (#970)Pavol Rusnak
2023-04-14ggml : add unary and binary map operations (#874)Kerfuffle
2023-04-13ggml : add GGML_DEFAULT_N_THREADSGeorgi Gerganov
2023-04-11Add enum llama_ftype, sync ggml_type to model files (#709)Stephan Walter
2023-04-10ggml : add ggml_cont() + optimize ggml_cpy() for contiguous dstGeorgi Gerganov
2023-04-10Rewrite loading code to try to satisfy everyone:comex
2023-04-08Add quantize-stats command for testing quantization (#728)unbounded
2023-04-05ggml, llama : avoid heavy V transpose + improvements (#775)Georgi Gerganov
2023-04-02ggml : change ne to int64_t (#626)Marian Cepok
2023-03-30Ensure --mlock works properly with mmap() supportJustine Tunney
2023-03-30Add mmap support for model filesSlaren
2023-03-28ggml : introduce structs for the q4 data blocks (#356)Stephan Walter