aboutsummaryrefslogtreecommitdiff
path: root/ggml-cuda.cu
AgeCommit message (Expand)Author
2023-08-04CUDA: use min compute capability of GPUs actually used (#2506)Cebtenzzre
2023-08-04CUDA: check if event is NULL before cudaStreamWaitEvent (#2505)Cebtenzzre
2023-08-02CUDA: faster non k-quant mul_mat_q kernels (#2483)Johannes Gäßler
2023-08-02CUDA: Fix models with output size != 32000 (#2480)Johannes Gäßler
2023-07-31CUDA: mmq CLI option, fixed mmq build issues (#2453)Johannes Gäßler
2023-07-31CUDA: Implemented row flattening for non-glm RoPE (#2468)Johannes Gäßler
2023-07-31CUDA: fewer memory bank conflicts for mul_mat_q (#2458)Johannes Gäßler
2023-07-29CUDA: Quantized matrix matrix multiplication (#2160)Johannes Gäßler
2023-07-29CUDA: faster multi GPU synchronization (#2448)Johannes Gäßler
2023-07-25Fix Q4_K and Q5_K for QK_K = 64 on CUDA (#2359)Kawrakow
2023-07-24make rms_norm_eps a parameter (#2374)slaren
2023-07-24ggml : sync (unary ops refactor, static-correctness) (#2370)Georgi Gerganov
2023-07-24Some more Q4_K and Q5_K speedup on CUDA (#2346)Kawrakow
2023-07-23ggml: move op parameters from tensors to ggml_tensor::op_params (#2333)slaren
2023-07-23llama : grouped-query attention + LLaMAv2 70B support (#2276)Georgi Gerganov
2023-07-23Speed up Q4_K (#2322)Kawrakow
2023-07-22CUDA: Fixed 7b q3_K_S with mul_mat_vec_q (#2313)Johannes Gäßler
2023-07-21Custom RoPE + bettter memory management for CUDA (#2295)Kawrakow
2023-07-21llama : make tensor_split ptr instead of array (#2272)Georgi Gerganov
2023-07-17Support dup & cont ops on CUDA (#2242)Jiahao Li
2023-07-14cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer ...Bach Le
2023-07-14cuda : support broadcast add & mul (#2192)Jiahao Li
2023-07-14CUDA: mul_mat_vec_q kernels for k-quants (#2203)Johannes Gäßler
2023-07-14ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope)Georgi Gerganov
2023-07-13Fix compile error on Windows CUDA (#2207)Howard Su
2023-07-12cuda : add gelu supportGeorgi Gerganov
2023-07-12Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189)Johannes Gäßler
2023-07-12ggml : revert CUDA broadcast changes from #2183 (#2191)Georgi Gerganov
2023-07-11ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183)Georgi Gerganov
2023-07-11ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)Spencer Sutton
2023-07-08Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144)Johannes Gäßler
2023-07-08CUDA: add __restrict__ to mul mat vec kernels (#2140)Johannes Gäßler
2023-07-05Quantized dot products for CUDA mul mat vec (#2067)Johannes Gäßler
2023-07-03Fix crash of test-tokenizer-0 under Debug build (#2064)Howard Su
2023-07-01Better CUDA synchronization logic (#2057)Johannes Gäßler
2023-06-28cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028)Salvador E. Tropea
2023-06-28cuda : fix missing const qualifier in casts (#2027)Salvador E. Tropea
2023-06-28CUDA GPU acceleration for LoRAs + f16 models (#1970)Johannes Gäßler
2023-06-26k-quants : support for super-block size of 64 (#2001)Kawrakow
2023-06-26Fix assert when free invalid cuda pointer (#2005)Howard Su
2023-06-24#1869 Fix null reference errors when training from scratch with CUDA (#1907)Robyn
2023-06-19cuda : faster k-quants on older GPUs (#1930)Kawrakow
2023-06-19Convert vector to f16 for dequantize mul mat vec (#1913)Johannes Gäßler
2023-06-17Only one CUDA stream per device for async compute (#1898)Johannes Gäßler
2023-06-17ggml : fix warnings under MSVC (#1908)Howard Su
2023-06-16CUDA : faster k-quant dot kernels (#1862)Kawrakow
2023-06-15Fixed CUDA runtime version check (#1879)Johannes Gäßler
2023-06-15Fix the validation of main device (#1872)Howard Su
2023-06-14CUDA full GPU acceleration, KV cache in VRAM (#1827)Johannes Gäßler
2023-06-12Leverage mmap for offloading tensors to GPU (#1597)Howard Su