aboutsummaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2023-06-19Convert vector to f16 for dequantize mul mat vec (#1913)Johannes Gäßler
2023-06-18Added tokens per second to info prints (#1928)Johannes Gäßler
2023-06-18Fixed incorrectly applying RMS norm twice (#1925)Johannes Gäßler
2023-06-18ggml : fix bug in ggml_compute_forward_add_q_f32 (#1918)l3utterfly
2023-06-18readme : update Android build instructions (#1922)Mike
2023-06-18llama : prevent usage of k-quants when tensor size is not a multiple of 256 (...Kawrakow
2023-06-18examples : fix examples/metal (#1920)Kawrakow
2023-06-18metal : handle buffers larger than device's maxBufferLength (#1826)Georgi Gerganov
2023-06-18cmake : add CUDA_ARCHITECTURES to new target ggml_static (#1917)Howard Su
2023-06-17make : do not print help for simple exampleGeorgi Gerganov
2023-06-17minor : warning fixesGeorgi Gerganov
2023-06-17Only one CUDA stream per device for async compute (#1898)Johannes Gäßler
2023-06-17llama : fix kv_cache `n` init (close #1903)Georgi Gerganov
2023-06-17make : update for latest Arch (#1701)DaniAndTheWeb
2023-06-17ggml : fix warnings under MSVC (#1908)Howard Su
2023-06-17metal : add norm, cpy f16->f16, alibi kernels (#1823)Aaron Miller
2023-06-17exposed modules so that they can be invoked by nix run github:ggerganov/llama...Faez Shakil
2023-06-17Server Example Refactor and Improvements (#1570)Randall Fitzgerald
2023-06-17hooks : setting up flake8 and pre-commit hooks (#1681)Jiří Podivín
2023-06-17readme : alternative way to build for Android with CLBlast. (#1828)Gustavo Rocha Dias
2023-06-17Allow cmake to build ggml as a library (#1896)Kerfuffle
2023-06-17train : get raw text instead of page with html (#1905)David Yang
2023-06-16opencl : support k-quants (#1836)0cc4m
2023-06-16examples : add "simple" (#1840)SuperUserNameMan
2023-06-16cmake : add auto detection of BLAS_INCLUDE_DIRS (#1886)Zenix
2023-06-16llama : fix embd when offloading non-repeating layers (#1891)Johannes Gäßler
2023-06-16Fixed possible macro redefinition (#1892)FrankHB
2023-06-16build : fix and ignore MSVC warnings (#1889)Borislav Stanimirov
2023-06-16CUDA : faster k-quant dot kernels (#1862)Kawrakow
2023-06-16gitignore : add several entries specific to Visual Studio (#1888)Borislav Stanimirov
2023-06-15Fixed CUDA runtime version check (#1879)Johannes Gäßler
2023-06-15cmake : remove whitespacesGeorgi Gerganov
2023-06-15examples : add chat-vicuna.sh (#1854)yangli2
2023-06-15cmake : set include path for OpenBlas (#1830)Igor Okulist
2023-06-15swift : Package compile breaks due to ggml-metal.metal (#1831)Frederik Vogel
2023-06-15make : add train-text-from-scratch (#1850)daboe01
2023-06-15readme : server compile flag (#1874)Srinivas Billa
2023-06-15make : clean *.so files (#1857)sandyiscool
2023-06-15Fix the validation of main device (#1872)Howard Su
2023-06-15metal : parallel command buffer encoding (#1860)Georgi Gerganov
2023-06-15Better error when using both LoRA + GPU layers (#1861)Johannes Gäßler
2023-06-14CUDA full GPU acceleration, KV cache in VRAM (#1827)Johannes Gäßler
2023-06-13baby-llama : fix operator!= (#1821)0xspringtime
2023-06-13train : improved training-from-scratch example (#1652)xaedes
2023-06-13llama : do a warm-up eval at start for better timings (#1824)Georgi Gerganov
2023-06-13Allow "quantizing" to f16 and f32 (#1787)Kerfuffle
2023-06-12Metal implementation for all k_quants (#1807)Kawrakow
2023-06-12ci : run when changing only the CUDA sources (#1800)slaren
2023-06-12Leverage mmap for offloading tensors to GPU (#1597)Howard Su
2023-06-12metal : fix failure to load model (#1817)Kawrakow