aboutsummaryrefslogtreecommitdiff
path: root/README.md
AgeCommit message (Collapse)Author
2023-06-25readme : add new roadmap + manifestoGeorgi Gerganov
2023-06-25readme : add Azure CI discussion linkGeorgi Gerganov
2023-06-24readme : fix whitespacesGeorgi Gerganov
2023-06-24readme : fixed termux instructions (#1973)Alberto
2023-06-23Add OpenLLaMA instructions to the README (#1954)eiery
* add openllama to readme
2023-06-21Fix typo in README.md (#1961)Rahul Vivek Nair
2023-06-20readme : add link to p1Georgi Gerganov
2023-06-20Fix typo (#1949)Xiake Sun
2023-06-19Convert vector to f16 for dequantize mul mat vec (#1913)Johannes Gäßler
* Convert vector to f16 for dmmv * compile option * Added compilation option description to README * Changed cmake CUDA_ARCHITECTURES from "OFF" to "native"
2023-06-18readme : update Android build instructions (#1922)Mike
Add steps for using termux on android devices to prevent common errors.
2023-06-17Only one CUDA stream per device for async compute (#1898)Johannes Gäßler
2023-06-17readme : alternative way to build for Android with CLBlast. (#1828)Gustavo Rocha Dias
2023-06-10doc : fix wrong address of BLIS.md (#1772)Aisuko
Signed-off-by: Aisuko <urakiny@gmail.com>
2023-06-07readme : add June roadmapGeorgi Gerganov
2023-06-05docs : add performance troubleshoot + example benchmark documentation (#1674)Yuval Peled
* test anchor link * test table * add benchmarks * Add performance troubleshoot & benchmark * add benchmarks * remove unneeded line --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-05readme : fix typo (#1700)Foul-Tarnished
Fix a typo in a command in README.md
2023-06-04readme : update hot topicsGeorgi Gerganov
2023-06-04llama : Metal inference (#1642)Georgi Gerganov
* mtl : export the LLaMA computation graph * ci : disable temporary * mtl : adapt the MNIST example as starter * mtl : no need for mtl-export tool, add cli arg for main instead * mtl : export just a small part of the graph for now to make it easier * mtl : move MSL code into separate file for easy editing * mtl : initial get_rows_q4_0 kernel * mtl : confirmed get_rows_q4_0 is working correctly * mtl : add rms_norm kernel + confirm working * mtl : add mul kernel + confirm working * mtl : initial mul_mat Q4 kernel (wrong results) * mtl : mul_mat fixes (still wrong) * mtl : another mul_mat Q4 (still does not work) * mtl : working mul_mat q4 * ggml : fix handling of "view" ops in ggml_graph_import() * mtl : add rope kernel * mtl : add reshape and transpose handling * ggml : store offset as opt arg for ggml_view_xd() operators * mtl : add cpy kernel + handle view ops * mtl : confirm f16 x f32 attention mul mat * mtl : add scale kernel * mtl : add diag_mask_inf kernel * mtl : fix soft_max kernel * ggml : update ggml_nbytes() to handle non-contiguous tensors * mtl : verify V tensor contents * mtl : add f32 -> f32 cpy kernel * mtl : add silu kernel * mtl : add non-broadcast mul kernel * mtl : full GPU inference of the computation graph * mtl : optimize rms_norm and soft_max kernels * mtl : add f16 mat x f32 vec multiplication kernel * mtl : fix bug in f16 x f32 mul mat + speed-up computation * mtl : faster mul_mat_q4_0_f32 kernel * mtl : fix kernel signature + roll inner loop * mtl : more threads for rms_norm + better timing * mtl : remove printfs from inner loop * mtl : simplify implementation * mtl : add save/load vocab to ggml file * mtl : plug Metal inference into llama.cpp (very quick-n-dirty) * mtl : make it work with main example Lots of hacks but at least now it generates text * mtl : preparing for merge * mtl : clean-up ggml mtl interface + suport scratch / inplace * mtl : remove temp / debug code * metal : final refactoring and simplification * Revert "ci : disable temporary" This reverts commit 98c267fc77fe811082f672538fc91bcfc9072d63. * metal : add comments * metal : clean-up stuff, fix typos * readme : add Metal instructions * readme : add example for main
2023-06-03Add info about CUDA_VISIBLE_DEVICES (#1682)Henri Vasserman
2023-05-27Add documentation about CLBlast (#1604)Henri Vasserman
Installing, compiling and using.
2023-05-24readme : add docs for chat-persistent.sh (#1568)Evan Jones
* readme : add docs for chat-persistent.sh * Update README.md
2023-05-20feature : support blis and other blas implementation (#1536)Zenix
* feature: add blis support * feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927 * fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake * Fix typo in INTEGER Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix: blas changes on ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-20Revert "feature : add blis and other BLAS implementation support (#1502)"Georgi Gerganov
This reverts commit 07e9ace0f9da424d82e75df969642522880feb92.
2023-05-20feature : add blis and other BLAS implementation support (#1502)Zenix
* feature: add blis support * feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927 * fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake * Fix typo in INTEGER Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-19ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)Georgi Gerganov
* ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0 * llama : bump LLAMA_FILE_VERSION to 3 * cuda : update Q4 and Q8 dequantize kernels * ggml : fix AVX dot products * readme : update performance table + hot topics
2023-05-19readme : adds WizardLM to the list of supported models (#1485)David Kennedy
2023-05-13readme : update Q4_0 perplexitiesGeorgi Gerganov
I think these were affected by the removal of the `round` during quantization
2023-05-12readme : add C#/.NET bindings repo (#1409)Rinne
2023-05-12ggml : remove bit shuffling (#1405)Georgi Gerganov
* ggml : remove Q4_0 bit shufling (ARM NEON) * ggml : remove Q4_1 bit shuffling (ARM NEON + reference) * ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON) * ggml : remove Q4_2 bit shuffling (WIP, BROKEN) * ggml : remove Q5_0 bit shuffling (ARM NEON) * ggml : 2x faster scalar implementations * ggml : remove Q5_1 bit shuffling (ARM NEON + scalar) * ggml : simplify scalar dot * ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit * ggml : fix Q4_1 quantization * ggml : update cuBLAS + normalize variable names * ggml : remove Q4_2 mode * ggml : minor formatting * ggml : fix Q5_0 quantization * scripts : add script for measuring the time per token * AVX implementations (#1370) * ggml : uniform 5th bit extraction * llama : produce error upon loading old model files * llama : fix model magic/version write * ggml : speed-up Q5_0 + Q5_1 at 4 threads * ggml : preserve old Q4 and Q5 formats * ggml : simplify Q8_1 - no need for low / high sums anymore * ggml : fix Q8_0 and Q8_1 rounding * Revert "AVX implementations (#1370)" This reverts commit 948d124837f9d287d8490f41338e0e4cceb0814f. * ggml : fix AVX2 implementation * sha : update hashes for 7B and 13B * readme : update timings + remove warning banner * llama : update v2 PR number to 1405 * ggml : fix WASM comments * ggml : back to original bit order * readme : add note that Q4 and Q5 have been changed * llama : fix return for unknown version --------- Co-authored-by: Stephan Walter <stephan@walter.name>
2023-05-08readme : add notice about upcoming breaking changeGeorgi Gerganov
2023-05-08readme : add TOC and Pygmalion instructions (#1359)AlpinDale
2023-05-08llama : require first token to be BOS (#1303)Georgi Gerganov
* llama : require first token to be BOS * scripts : add ppl-run-all.sh * perplexity : add BOS for each chunk * readme : update perplexity values after BOS fix * perplexity : add clarifying comments
2023-05-08Documented CUDA reproducibility, added warning (#1346)Johannes Gäßler
2023-05-05makefile: automatic Arch Linux detection (#1332)DaniAndTheWeb
This commit is a port of a detection method used in koboldcpp's Makefile in order to automatically set the -lcblas option on Arch Linux
2023-05-05readme: add missing info (#1324)Pavol Rusnak
2023-05-04readme : add OpenBuddy link (#1321)44670
2023-05-03minor : fix whitespaces (#1302)Georgi Gerganov
2023-05-03scripts : platform independent script to verify sha256 checksums (#1203)KASR
* python script to verify the checksum of the llama models Added Python script for verifying SHA256 checksums of files in a directory, which can run on multiple platforms. Improved the formatting of the output results for better readability. * Update README.md update to the readme for improved readability and to explain the usage of the python checksum verification script * update the verification script I've extended the script based on suggestions by @prusnak The script now checks the available RAM, is there is enough to check the file at once it will do so. If not the file is read in chunks. * minor improvment small change so that the available ram is checked and not the total ram * remove the part of the code that reads the file at once if enough ram is available based on suggestions from @prusnak i removed the part of the code that checks whether the user had enough ram to read the entire model at once. the file is now always read in chunks. * Update verify-checksum-models.py quick fix to pass the git check
2023-04-28Remove Q4_3 which is no better than Q5 (#1218)Stephan Walter
2023-04-28readme : update hot topicsGeorgi Gerganov
2023-04-28Correcting link to w64devkit (#1214)Folko-Ven
Correcting link to w64devkit (change seeto to skeeto).
2023-04-26readme : add quantization infoGeorgi Gerganov
2023-04-26Updating build instructions to include BLAS support (#1183)DaniAndTheWeb
* Updated build information First update to the build instructions to include BLAS. * Update README.md * Update information about BLAS * Better BLAS explanation Adding a clearer BLAS explanation and adding a link to download the CUDA toolkit. * Better BLAS explanation * BLAS for Mac Specifying that BLAS is already supported on Macs using the Accelerate Framework. * Clarify the effect of BLAS * Windows Make instructions Added the instructions to build with Make on Windows * Fixing typo * Fix trailing whitespace
2023-04-26quantize : use `map` to assign quantization type from `string` (#1191)Pavol Rusnak
instead of `int` (while `int` option still being supported) This allows the following usage: `./quantize ggml-model-f16.bin ggml-model-q4_0.bin q4_0` instead of: `./quantize ggml-model-f16.bin ggml-model-q4_0.bin 2`
2023-04-24examples/main README improvements and some light refactoring (#1131)mgroeber9110
2023-04-23readme : update gpt4all instructions (#980)Pavol Rusnak
2023-04-19Minor: Readme fixed grammar, spelling, and misc updates (#1071)CRD716
2023-04-19readme : add warning about Q4_2 and Q4_3Georgi Gerganov
2023-04-18readme : update hot topics about new LoRA functionalityGeorgi Gerganov
2023-04-17readme : add Ruby bindings (#1029)Atsushi Tatsuma