aboutsummaryrefslogtreecommitdiff
path: root/ggml-metal.m
AgeCommit message (Expand)Author
2023-08-01metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)Matteo Boschini
2023-07-25metal : concurrently dispatch commands (#2358)Shouzheng Liu
2023-07-24make rms_norm_eps a parameter (#2374)slaren
2023-07-24ggml : sync (unary ops refactor, static-correctness) (#2370)Georgi Gerganov
2023-07-23ggml: move op parameters from tensors to ggml_tensor::op_params (#2333)slaren
2023-07-23metal : support bcast add & dup & cont op (#2323)Jiahao Li
2023-07-21Faster Q3_K implementation on Metal (#2307)Kawrakow
2023-07-21Faster Q2_K on Metal (#2297)Kawrakow
2023-07-20Faster Q5_K and Q6_K on Metal (#2294)Kawrakow
2023-07-20Faster Q4_K on Metal (#2290)Kawrakow
2023-07-20metal: minor q4 optimization and reduce code size (#2248)Shouzheng Liu
2023-07-15llama : add custom RoPE (#2054)Xiao-Yong Jin
2023-07-14Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212)Kawrakow
2023-07-12metal : new q4_0 matrix-vector kernel (#2188)Shouzheng Liu
2023-07-11ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)Spencer Sutton
2023-07-10mpi : add support for distributed inference via MPI (#2099)Evan Miller
2023-07-07ggml : change ggml_graph_compute() API to not require context (#1999)Qingyou Meng
2023-07-01metal : release buffers when freeing metal context (#2062)Aaron Miller
2023-06-26k-quants : support for super-block size of 64 (#2001)Kawrakow
2023-06-18metal : handle buffers larger than device's maxBufferLength (#1826)Georgi Gerganov
2023-06-17minor : warning fixesGeorgi Gerganov
2023-06-17metal : add norm, cpy f16->f16, alibi kernels (#1823)Aaron Miller
2023-06-15metal : parallel command buffer encoding (#1860)Georgi Gerganov
2023-06-12Metal implementation for all k_quants (#1807)Kawrakow
2023-06-12metal : fix failure to load model (#1817)Kawrakow
2023-06-10metal : fix issue with ggml-metal.metal path. Closes #1769 (#1782)Andrei
2023-06-10metal : add Q4_1 implementation (#1785)Kawrakow
2023-06-09metal : add GELU implementation (#1770)AT
2023-06-09metal : faster q4_0 (#1775)Kawrakow
2023-06-08metal : add Q2_K implementation (#1762)Kawrakow
2023-06-08metal : Q6_K implementation (#1752)Kawrakow
2023-06-08metal : add Q4_K implementation (#1733)Kawrakow
2023-06-06metal : add f16 supportGeorgi Gerganov
2023-06-06metal : add checks for buffer size (#1706)Spencer Sutton
2023-06-05metal : use shared buffers between CPU and GPU (#1696)kiltyj
2023-06-04llama : Metal inference (#1642)Georgi Gerganov