aboutsummaryrefslogtreecommitdiff
path: root/ggml-metal.m
AgeCommit message (Expand)Author
2023-07-14Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212)Kawrakow
2023-07-12metal : new q4_0 matrix-vector kernel (#2188)Shouzheng Liu
2023-07-11ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)Spencer Sutton
2023-07-10mpi : add support for distributed inference via MPI (#2099)Evan Miller
2023-07-07ggml : change ggml_graph_compute() API to not require context (#1999)Qingyou Meng
2023-07-01metal : release buffers when freeing metal context (#2062)Aaron Miller
2023-06-26k-quants : support for super-block size of 64 (#2001)Kawrakow
2023-06-18metal : handle buffers larger than device's maxBufferLength (#1826)Georgi Gerganov
2023-06-17minor : warning fixesGeorgi Gerganov
2023-06-17metal : add norm, cpy f16->f16, alibi kernels (#1823)Aaron Miller
2023-06-15metal : parallel command buffer encoding (#1860)Georgi Gerganov
2023-06-12Metal implementation for all k_quants (#1807)Kawrakow
2023-06-12metal : fix failure to load model (#1817)Kawrakow
2023-06-10metal : fix issue with ggml-metal.metal path. Closes #1769 (#1782)Andrei
2023-06-10metal : add Q4_1 implementation (#1785)Kawrakow
2023-06-09metal : add GELU implementation (#1770)AT
2023-06-09metal : faster q4_0 (#1775)Kawrakow
2023-06-08metal : add Q2_K implementation (#1762)Kawrakow
2023-06-08metal : Q6_K implementation (#1752)Kawrakow
2023-06-08metal : add Q4_K implementation (#1733)Kawrakow
2023-06-06metal : add f16 supportGeorgi Gerganov
2023-06-06metal : add checks for buffer size (#1706)Spencer Sutton
2023-06-05metal : use shared buffers between CPU and GPU (#1696)kiltyj
2023-06-04llama : Metal inference (#1642)Georgi Gerganov