aboutsummaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
2023-05-19ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)Georgi Gerganov
2023-05-19tests : add missing headerGeorgi Gerganov
2023-05-19examples : add persistent chat (#1495)Evan Jones
2023-05-19main : make reverse prompt option act as a stop token in non-interactive mode...Jason McCartney
2023-05-19readme : adds WizardLM to the list of supported models (#1485)David Kennedy
2023-05-19minor : fix compile warningsGeorgi Gerganov
2023-05-18make kv_f16 the default for api users (#1517)Erik Scholz
2023-05-18Fixes #1511 lambda issue for w64devkit (mingw) (#1513)DannyDaemonic
2023-05-17Remove unused n_parts parameter (#1509)Stephan Walter
2023-05-17benchmark-matmul: Print the average of the test results (#1490)rankaiyx
2023-05-17convert.py: Support models which are stored in a single pytorch_model.bin (#1...Tom Jobbins
2023-05-16~7% faster Q5_1 AVX2 code (#1477)Ilya Kurdyukov
2023-05-16define default model path once, sync path with readme (#1366)András Salamon
2023-05-16Add alternate include path for openblas (#1476)sandyiscool
2023-05-15fix get_num_physical_cores() (#1436)zrm
2023-05-14benchmark-matmul: fix clang-tidy issues, report results in GFLOPS (#1458)slaren
2023-05-14cuda : deduplicated dequantization code (#1453)Johannes Gäßler
2023-05-14ggml : alternative fix for race condition bug in non-inplace ggml_compute_for...xaedes
2023-05-14ggml : various fixes (#1450)Georgi Gerganov
2023-05-14ggml : add AVX support based on AVX2 code (#1430)katsu560
2023-05-14ggml : add GGML_QNT_VERSION to track quantization format changesGeorgi Gerganov
2023-05-13cuda : fix convert function (#1412)Georgi Gerganov
2023-05-13make : fix PERF build with cuBLASGeorgi Gerganov
2023-05-13llama : fix unused warningGeorgi Gerganov
2023-05-13ggml : multi-thread mul and diag_mask ops (#1428)Georgi Gerganov
2023-05-13ggml : GPU-accelerated token generation (#1412)Johannes Gäßler
2023-05-13ggml : implement backward pass for llama + small training-llama-from-scratch ...xaedes
2023-05-13ggml : sync alibi fix from ggml repoGeorgi Gerganov
2023-05-13Adding SSE instructions to ggml_vec_dot_q4_0_q8_0 (#1413)3ooabkhxtn
2023-05-13llama : fix various warningsGeorgi Gerganov
2023-05-13embedding : remove unused code (#1426)Rinne
2023-05-13readme : update Q4_0 perplexitiesGeorgi Gerganov
2023-05-13llama : free ggml context in set / copy state data (close #1425)Georgi Gerganov
2023-05-13opencl : fix kernels for the new formats (#1422)Henri Vasserman
2023-05-12llama : fix --mtest option (close #1414)Georgi Gerganov
2023-05-12CLI args use - instead of _, backwards compatible (#1416)Johannes Gäßler
2023-05-12Add clang-tidy reviews to CI (#1407)slaren
2023-05-12readme : add C#/.NET bindings repo (#1409)Rinne
2023-05-12ggml : remove bit shuffling (#1405)Georgi Gerganov
2023-05-11prompts : model agnostic DAN (#1304)CRD716
2023-05-10main : add option to save full output to session (#1338)Evan Jones
2023-05-09Locale fix for Windows (#1379)DannyDaemonic
2023-05-09use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler (#1314)Sami Farin
2023-05-08Interface improvements and `--multiline-input` (previously `--author-mode`) (...DannyDaemonic
2023-05-08readme : add notice about upcoming breaking changeGeorgi Gerganov
2023-05-08readme : add TOC and Pygmalion instructions (#1359)AlpinDale
2023-05-08llama : fix hparams shadow (#1367)Pavol Rusnak
2023-05-08llama : require first token to be BOS (#1303)Georgi Gerganov
2023-05-08convert: add ability to convert safetensors files (#1276)ubik2
2023-05-08Documented CUDA reproducibility, added warning (#1346)Johannes Gäßler