aboutsummaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2023-06-24convert : fix invalid params in write_vocab_only (#1975)AN Long
2023-06-24ggml : improve ggml_graph_dump_dot, add ggml_format_name (#1978)slaren
2023-06-24readme : fix whitespacesGeorgi Gerganov
2023-06-24readme : fixed termux instructions (#1973)Alberto
2023-06-24llama : fix top-p sampling to match the canonical definition (#1953)Alex Renda
2023-06-24llama : make model stateless and context stateful (llama_state) (#1797)Didzis Gosko
2023-06-23Add OpenLLaMA instructions to the README (#1954)eiery
2023-06-22rework convert.py to read hyper-parameters from config.json (#1958)Erik Scholz
2023-06-21cmake: revert CUDA arch default to 52, 61 if f16 (#1959)Johannes Gäßler
2023-06-21Fix typo in README.md (#1961)Rahul Vivek Nair
2023-06-20readme : add link to p1Georgi Gerganov
2023-06-20Fix typo (#1949)Xiake Sun
2023-06-20llama : fix params struct slignment (#1936)Ettore Di Giacinto
2023-06-20[Fix] Reenable server embedding endpoint (#1937)Henri Vasserman
2023-06-19ggml : fix bug in LBFGS optimizer (found by ggml tests)Georgi Gerganov
2023-06-19llama : use aligned memory during ggml_init call from loading saved sessions ...l3utterfly
2023-06-19cmake : fix trailing whitespacesGeorgi Gerganov
2023-06-19llama : only use Q6_K for output weights if tensor size is multiple of 256 (#...Kawrakow
2023-06-19cuda : faster k-quants on older GPUs (#1930)Kawrakow
2023-06-19ggml : sync latest ggml repo (#1924)Georgi Gerganov
2023-06-19cmake : fix build shared ggml when CUDA is enabled (#1929)Howard Su
2023-06-19Convert vector to f16 for dequantize mul mat vec (#1913)Johannes Gäßler
2023-06-18Added tokens per second to info prints (#1928)Johannes Gäßler
2023-06-18Fixed incorrectly applying RMS norm twice (#1925)Johannes Gäßler
2023-06-18ggml : fix bug in ggml_compute_forward_add_q_f32 (#1918)l3utterfly
2023-06-18readme : update Android build instructions (#1922)Mike
2023-06-18llama : prevent usage of k-quants when tensor size is not a multiple of 256 (...Kawrakow
2023-06-18examples : fix examples/metal (#1920)Kawrakow
2023-06-18metal : handle buffers larger than device's maxBufferLength (#1826)Georgi Gerganov
2023-06-18cmake : add CUDA_ARCHITECTURES to new target ggml_static (#1917)Howard Su
2023-06-17make : do not print help for simple exampleGeorgi Gerganov
2023-06-17minor : warning fixesGeorgi Gerganov
2023-06-17Only one CUDA stream per device for async compute (#1898)Johannes Gäßler
2023-06-17llama : fix kv_cache `n` init (close #1903)Georgi Gerganov
2023-06-17make : update for latest Arch (#1701)DaniAndTheWeb
2023-06-17ggml : fix warnings under MSVC (#1908)Howard Su
2023-06-17metal : add norm, cpy f16->f16, alibi kernels (#1823)Aaron Miller
2023-06-17exposed modules so that they can be invoked by nix run github:ggerganov/llama...Faez Shakil
2023-06-17Server Example Refactor and Improvements (#1570)Randall Fitzgerald
2023-06-17hooks : setting up flake8 and pre-commit hooks (#1681)Jiří Podivín
2023-06-17readme : alternative way to build for Android with CLBlast. (#1828)Gustavo Rocha Dias
2023-06-17Allow cmake to build ggml as a library (#1896)Kerfuffle
2023-06-17train : get raw text instead of page with html (#1905)David Yang
2023-06-16opencl : support k-quants (#1836)0cc4m
2023-06-16examples : add "simple" (#1840)SuperUserNameMan
2023-06-16cmake : add auto detection of BLAS_INCLUDE_DIRS (#1886)Zenix
2023-06-16llama : fix embd when offloading non-repeating layers (#1891)Johannes Gäßler
2023-06-16Fixed possible macro redefinition (#1892)FrankHB
2023-06-16build : fix and ignore MSVC warnings (#1889)Borislav Stanimirov
2023-06-16CUDA : faster k-quant dot kernels (#1862)Kawrakow