Age | Commit message (Expand) | Author |
2023-06-07 | k-quants : allow to optionally disable at compile time (#1734) | Georgi Gerganov |
2023-06-07 | flake : update to support metal on m1/m2 (#1724) | jacobi petrucciani |
2023-06-07 | readme : add June roadmap | Georgi Gerganov |
2023-06-06 | main: add the possibility to open the prompt cache read-only (#1640) | Willy Tarreau |
2023-06-06 | llama : fix vram_scratch var | Georgi Gerganov |
2023-06-06 | llama : fix compile warnings | Georgi Gerganov |
2023-06-06 | Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) | Johannes Gäßler |
2023-06-06 | metal : add f16 support | Georgi Gerganov |
2023-06-06 | Clblast fixes + enhancements to save VRAM and offload more layers (#1675) | LostRuins |
2023-06-06 | ggml : fix builds, add ggml-quants-k.o (close #1712, close #1710) | Georgi Gerganov |
2023-06-06 | gitignore : add .clang-tidy | Georgi Gerganov |
2023-06-06 | llama : temporary disable Q6_K output quantization (#1711) | Georgi Gerganov |
2023-06-06 | metal : add checks for buffer size (#1706) | Spencer Sutton |
2023-06-05 | docs : add performance troubleshoot + example benchmark documentation (#1674) | Yuval Peled |
2023-06-05 | readme : fix typo (#1700) | Foul-Tarnished |
2023-06-05 | llama : consistently catch and throw only exceptions deriving from std::excep... | mgroeber9110 |
2023-06-05 | metal : use shared buffers between CPU and GPU (#1696) | kiltyj |
2023-06-05 | ggml : fix internal overflow in ggml_time_us on Windows (#1702) | grahameth |
2023-06-05 | ci : disable auto tidy (#1705) | Georgi Gerganov |
2023-06-05 | ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) | Kawrakow |
2023-06-05 | Increase 3B scratch buffers. (#1698) | Henri Vasserman |
2023-06-05 | llama : fix Metal KV cache sync (close #1695) | Georgi Gerganov |
2023-06-04 | readme : update hot topics | Georgi Gerganov |
2023-06-04 | llama : Metal inference (#1642) | Georgi Gerganov |
2023-06-04 | OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653) | 0cc4m |
2023-06-03 | Add info about CUDA_VISIBLE_DEVICES (#1682) | Henri Vasserman |
2023-06-03 | Docker: change to calling convert.py (#1641) | Jiří Podivín |
2023-06-03 | Fix prompt cache saving and chat-persistent rollover (#1678) | Evan Jones |
2023-05-30 | OpenLLaMA 3B support (#1588) | Henri Vasserman |
2023-05-29 | ggml : sync cgraph import / export API | Georgi Gerganov |
2023-05-29 | ggml : fix bug in ggml_alibi | Georgi Gerganov |
2023-05-29 | Work around for recalculating logits in cached prompts (Fixes #1585) (#1609) | DannyDaemonic |
2023-05-28 | Adding git in container package dependencies (#1621) | Jiří Podivín |
2023-05-28 | LLAMA_DEBUG adds debug symbols (#1617) | Johannes Gäßler |
2023-05-28 | Only show -ngl option when relevant + other doc/arg handling updates (#1625) | Kerfuffle |
2023-05-28 | examples : add --alias option to gpt_params to set use friendly model name (#... | Vladimir Zorin |
2023-05-28 | opencl : no need to allocate cl_mem on heap (#1612) | Howard Su |
2023-05-28 | opencl : use strstr to check if fp16 supported (#1611) | Howard Su |
2023-05-27 | ggml : add support for the RISCV architecture (#1616) | apcameron |
2023-05-27 | Include server in releases + other build system cleanups (#1610) | Kerfuffle |
2023-05-27 | Add documentation about CLBlast (#1604) | Henri Vasserman |
2023-05-27 | [CI] Fix openblas (#1613) | Henri Vasserman |
2023-05-27 | ggml : add ggml_tensor_overhead() | Georgi Gerganov |
2023-05-27 | [CI] CLBlast: Fix directory name (#1606) | Henri Vasserman |
2023-05-27 | ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name()) | Georgi Gerganov |
2023-05-25 | Some improvements to loading the session with --prompt-cache (#1550) | Kerfuffle |
2023-05-26 | cuda : performance optimizations (#1530) | Johannes Gäßler |
2023-05-24 | Update CLBlast to 1.6.0 (#1580) | Henri Vasserman |
2023-05-24 | readme : add docs for chat-persistent.sh (#1568) | Evan Jones |
2023-05-24 | chat-persistent.sh : use bracket expressions in grep (#1564) | Senemu |