llama.cpp.git - llama.cpp

Age	Commit message (Expand)	Author
2023-07-21	Faster Q2_K on Metal (#2297)	Kawrakow
2023-07-20	Faster Q5_K and Q6_K on Metal (#2294)	Kawrakow
2023-07-20	Faster Q4_K on Metal (#2290)	Kawrakow
2023-07-20	metal: minor q4 optimization and reduce code size (#2248)	Shouzheng Liu
2023-07-15	llama : add custom RoPE (#2054)	Xiao-Yong Jin
2023-07-14	Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212)	Kawrakow
2023-07-12	metal : new q4_0 matrix-vector kernel (#2188)	Shouzheng Liu
2023-07-11	ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)	Spencer Sutton
2023-07-10	mpi : add support for distributed inference via MPI (#2099)	Evan Miller
2023-07-07	ggml : change ggml_graph_compute() API to not require context (#1999)	Qingyou Meng
2023-07-01	metal : release buffers when freeing metal context (#2062)	Aaron Miller
2023-06-26	k-quants : support for super-block size of 64 (#2001)	Kawrakow
2023-06-18	metal : handle buffers larger than device's maxBufferLength (#1826)	Georgi Gerganov
2023-06-17	minor : warning fixes	Georgi Gerganov
2023-06-17	metal : add norm, cpy f16->f16, alibi kernels (#1823)	Aaron Miller
2023-06-15	metal : parallel command buffer encoding (#1860)	Georgi Gerganov
2023-06-12	Metal implementation for all k_quants (#1807)	Kawrakow
2023-06-12	metal : fix failure to load model (#1817)	Kawrakow
2023-06-10	metal : fix issue with ggml-metal.metal path. Closes #1769 (#1782)	Andrei
2023-06-10	metal : add Q4_1 implementation (#1785)	Kawrakow
2023-06-09	metal : add GELU implementation (#1770)	AT
2023-06-09	metal : faster q4_0 (#1775)	Kawrakow
2023-06-08	metal : add Q2_K implementation (#1762)	Kawrakow
2023-06-08	metal : Q6_K implementation (#1752)	Kawrakow
2023-06-08	metal : add Q4_K implementation (#1733)	Kawrakow
2023-06-06	metal : add f16 support	Georgi Gerganov
2023-06-06	metal : add checks for buffer size (#1706)	Spencer Sutton
2023-06-05	metal : use shared buffers between CPU and GPU (#1696)	kiltyj
2023-06-04	llama : Metal inference (#1642)	Georgi Gerganov