index
:
llama.cpp.git
master
llama.cpp
user
about
summary
refs
log
tree
commit
diff
log msg
author
committer
range
Age
Commit message (
Expand
)
Author
2023-08-10
fix runtime crash
HEAD
master
aditya
2023-08-10
resolve merge conflict
aditya
2023-08-09
ggml-alloc: Don't try to re-use buffers of external tensors (#2562)
Sam Spilsbury
2023-08-09
add log_callback to llama_context_params for custom logging. (#2234)
grahameth
2023-08-09
CUDA: tuned mul_mat_q kernels (#2546)
Johannes Gäßler
2023-08-08
Allow passing grammar to completion endpoint (#2532)
Martin Krasser
2023-08-08
CUDA: tighter VRAM scratch size for 65b/70b (#2551)
Johannes Gäßler
2023-08-08
llm.vim : multiline autocompletion, get rid of "^@" (#2543)
chaihahaha
2023-08-08
vim : bring back simple llm.vim example
Georgi Gerganov
2023-08-08
vim : streaming and more (#2495)
AustinMroz
2023-08-07
Add --rope-scale parameter (#2544)
klosax
2023-08-07
ggml : mul mat tweaks (#2372)
Georgi Gerganov
2023-08-07
ggml : pad result of ggml_nbytes()
Georgi Gerganov
2023-08-07
ggml : change params pointer (style change) (#2539)
Georgi Gerganov
2023-08-07
ggml : sync (custom ops) (#2537)
Georgi Gerganov
2023-08-07
Fixed mmap prefetch for GPU offloading (#2529)
Johannes Gäßler
2023-08-07
metal : fix out-of-bounds access + inc concurrency nodes (#2416)
Georgi Gerganov
2023-08-07
[Makefile] Move ARM CFLAGS before compilation (#2536)
GiviMAD
2023-08-07
[Zig] Rewrite build for Zig 0.11 (#2514)
Henri Vasserman
2023-08-06
console : fix issue related to Windows 11 PowerShell console mode persistence...
DannyDaemonic
2023-08-06
convert.py : add missing abstract methods for quantized data (#2491)
Keiichi Tabata
2023-08-05
CUDA: faster k-quant mul_mat_q kernels (#2525)
Johannes Gäßler
2023-08-04
fix firefox autoscroll (#2519)
Jonas Wunderlich
2023-08-04
server: regenerate completion.js.hpp (#2515)
Cebtenzzre
2023-08-04
CUDA: use min compute capability of GPUs actually used (#2506)
Cebtenzzre
2023-08-04
CUDA: check if event is NULL before cudaStreamWaitEvent (#2505)
Cebtenzzre
2023-08-04
Add --simple-io option for subprocesses and break out console.h and cpp (#1558)
DannyDaemonic
2023-08-04
Fixing race condition in server and partial stream handling in frontend. (#2391)
Stephen Nichols
2023-08-04
Stream save llama context data to file instead of allocating entire buffer up...
l3utterfly
2023-08-04
build : fix several cast and printf warnings (#2499)
Borislav Stanimirov
2023-08-02
examples : generate JSON according to schema (#1887)
Evan Jones
2023-08-02
CUDA: faster non k-quant mul_mat_q kernels (#2483)
Johannes Gäßler
2023-08-02
CUDA: Fix models with output size != 32000 (#2480)
Johannes Gäßler
2023-08-02
readme : add Aquila-7B model series to supported models (#2487)
ldwang
2023-08-02
tests : Fix compilation warnings (Linux/GCC) (#2451)
Eve
2023-08-02
readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475)
Yiming Cui
2023-08-01
fix a typo in examples/server/README.md (#2478)
Bono Lv
2023-08-01
server : Support dark mode (#2414)
ebraminio
2023-08-01
metal : add gqa8 kernel to allow llama-2-70B on metal (#2459)
Matteo Boschini
2023-07-31
CUDA: fixed LLAMA_FAST compilation option (#2473)
Johannes Gäßler
2023-07-31
CUDA: fixed cmake F16 option (#2471)
Johannes Gäßler
2023-07-31
CUDA: mmq CLI option, fixed mmq build issues (#2453)
Johannes Gäßler
2023-07-31
CUDA: Implemented row flattening for non-glm RoPE (#2468)
Johannes Gäßler
2023-07-31
CUDA: fewer memory bank conflicts for mul_mat_q (#2458)
Johannes Gäßler
2023-07-31
Fix Metal backend broken from the allocator changes (#2455)
slaren
2023-07-30
ggml : add graph tensor allocator (#2411)
slaren
2023-07-29
CUDA: Quantized matrix matrix multiplication (#2160)
Johannes Gäßler
2023-07-29
CUDA: faster multi GPU synchronization (#2448)
Johannes Gäßler
2023-07-28
perplexity : add Hellaswag calculation (#2389)
klosax
2023-07-28
ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405)
Lee
[next]