index
:
llama.cpp.git
master
llama.cpp
user
about
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
ggml-cuda.cu
Age
Commit message (
Expand
)
Author
2023-08-02
CUDA: faster non k-quant mul_mat_q kernels (#2483)
Johannes Gäßler
2023-08-02
CUDA: Fix models with output size != 32000 (#2480)
Johannes Gäßler
2023-07-31
CUDA: mmq CLI option, fixed mmq build issues (#2453)
Johannes Gäßler
2023-07-31
CUDA: Implemented row flattening for non-glm RoPE (#2468)
Johannes Gäßler
2023-07-31
CUDA: fewer memory bank conflicts for mul_mat_q (#2458)
Johannes Gäßler
2023-07-29
CUDA: Quantized matrix matrix multiplication (#2160)
Johannes Gäßler
2023-07-29
CUDA: faster multi GPU synchronization (#2448)
Johannes Gäßler
2023-07-25
Fix Q4_K and Q5_K for QK_K = 64 on CUDA (#2359)
Kawrakow
2023-07-24
make rms_norm_eps a parameter (#2374)
slaren
2023-07-24
ggml : sync (unary ops refactor, static-correctness) (#2370)
Georgi Gerganov
2023-07-24
Some more Q4_K and Q5_K speedup on CUDA (#2346)
Kawrakow
2023-07-23
ggml: move op parameters from tensors to ggml_tensor::op_params (#2333)
slaren
2023-07-23
llama : grouped-query attention + LLaMAv2 70B support (#2276)
Georgi Gerganov
2023-07-23
Speed up Q4_K (#2322)
Kawrakow
2023-07-22
CUDA: Fixed 7b q3_K_S with mul_mat_vec_q (#2313)
Johannes Gäßler
2023-07-21
Custom RoPE + bettter memory management for CUDA (#2295)
Kawrakow
2023-07-21
llama : make tensor_split ptr instead of array (#2272)
Georgi Gerganov
2023-07-17
Support dup & cont ops on CUDA (#2242)
Jiahao Li
2023-07-14
cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer ...
Bach Le
2023-07-14
cuda : support broadcast add & mul (#2192)
Jiahao Li
2023-07-14
CUDA: mul_mat_vec_q kernels for k-quants (#2203)
Johannes Gäßler
2023-07-14
ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope)
Georgi Gerganov
2023-07-13
Fix compile error on Windows CUDA (#2207)
Howard Su
2023-07-12
cuda : add gelu support
Georgi Gerganov
2023-07-12
Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189)
Johannes Gäßler
2023-07-12
ggml : revert CUDA broadcast changes from #2183 (#2191)
Georgi Gerganov
2023-07-11
ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183)
Georgi Gerganov
2023-07-11
ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)
Spencer Sutton
2023-07-08
Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144)
Johannes Gäßler
2023-07-08
CUDA: add __restrict__ to mul mat vec kernels (#2140)
Johannes Gäßler
2023-07-05
Quantized dot products for CUDA mul mat vec (#2067)
Johannes Gäßler
2023-07-03
Fix crash of test-tokenizer-0 under Debug build (#2064)
Howard Su
2023-07-01
Better CUDA synchronization logic (#2057)
Johannes Gäßler
2023-06-28
cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028)
Salvador E. Tropea
2023-06-28
cuda : fix missing const qualifier in casts (#2027)
Salvador E. Tropea
2023-06-28
CUDA GPU acceleration for LoRAs + f16 models (#1970)
Johannes Gäßler
2023-06-26
k-quants : support for super-block size of 64 (#2001)
Kawrakow
2023-06-26
Fix assert when free invalid cuda pointer (#2005)
Howard Su
2023-06-24
#1869 Fix null reference errors when training from scratch with CUDA (#1907)
Robyn
2023-06-19
cuda : faster k-quants on older GPUs (#1930)
Kawrakow
2023-06-19
Convert vector to f16 for dequantize mul mat vec (#1913)
Johannes Gäßler
2023-06-17
Only one CUDA stream per device for async compute (#1898)
Johannes Gäßler
2023-06-17
ggml : fix warnings under MSVC (#1908)
Howard Su
2023-06-16
CUDA : faster k-quant dot kernels (#1862)
Kawrakow
2023-06-15
Fixed CUDA runtime version check (#1879)
Johannes Gäßler
2023-06-15
Fix the validation of main device (#1872)
Howard Su
2023-06-14
CUDA full GPU acceleration, KV cache in VRAM (#1827)
Johannes Gäßler
2023-06-12
Leverage mmap for offloading tensors to GPU (#1597)
Howard Su
2023-06-11
Fixed WSL cuda's OOM error (#1594)
Kyle Liang
2023-06-09
Windows nvcc workaround (#1753)
Johannes Gäßler
[next]