llama.cpp.git - llama.cpp

Age	Commit message (Collapse)	Author
2023-08-09	CUDA: tuned mul_mat_q kernels (#2546)	Johannes Gäßler

2023-08-02	readme : add Aquila-7B model series to supported models (#2487)	ldwang
	* support bpe tokenizer in convert Signed-off-by: ldwang <ftgreat@gmail.com> * support bpe tokenizer in convert Signed-off-by: ldwang <ftgreat@gmail.com> * support bpe tokenizer in convert, fix Signed-off-by: ldwang <ftgreat@gmail.com> * Add Aquila-7B models in README.md Signed-off-by: ldwang <ftgreat@gmail.com> * Up Aquila-7B models in README.md Signed-off-by: ldwang <ftgreat@gmail.com> --------- Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>
2023-08-02	readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475)	Yiming Cui
	* add support for chinese llama-2 / alpaca-2 * remove white spaces
2023-07-31	CUDA: mmq CLI option, fixed mmq build issues (#2453)	Johannes Gäßler

2023-07-29	CUDA: Quantized matrix matrix multiplication (#2160)	Johannes Gäßler
	* mmq implementation for non k-quants * q6_K * q2_K * q3_k * q4_K * vdr * q5_K * faster q8_1 loading * loop unrolling * add __restrict__ * q2_K sc_high * GGML_CUDA_MMQ_Y * Updated Makefile * Update Makefile * DMMV_F16 -> F16 * Updated README, CMakeLists * Fix CMakeLists.txt * Fix CMakeLists.txt * Fix multi GPU out-of-bounds
2023-07-28	Obtaining LLaMA 2 instructions (#2308)	niansa/tuxifan
	* Obtaining LLaMA 2 instructions * Removed sharing warning for LLaMA 2 * Linked TheBloke's GGML repos * Add LLaMA 2 to list of supported models * Added LLaMA 2 usage instructions * Added links to LLaMA 2 70B models
2023-07-23	Fix __dp4a documentation (#2348)	Johannes Gäßler

2023-07-23	make : fix CLBLAST compile support in FreeBSD (#2331)	Jose Maldonado
	* Fix Makefile for CLBLAST compile support and instructions for compile llama.cpp FreeBSD * More general use-case for CLBLAST support (Linux and FreeBSD)
2023-07-21	flake : remove intel mkl from flake.nix due to missing files (#2277)	wzy
	NixOS's mkl misses some libraries like mkl-sdl.pc. See #2261 Currently NixOS doesn't have intel C compiler (icx, icpx). See https://discourse.nixos.org/t/packaging-intel-math-kernel-libraries-mkl/975 So remove it from flake.nix Some minor changes: - Change pkgs.python310 to pkgs.python3 to keep latest - Add pkgconfig to devShells.default - Remove installPhase because we have `cmake --install` from #2256
2023-07-19	flake : update flake.nix (#2270)	wzy
	When `isx86_32 \|\| isx86_64`, it will use mkl, else openblas According to https://discourse.nixos.org/t/rpath-of-binary-contains-a-forbidden-reference-to-build/12200/3, add -DCMAKE_SKIP_BUILD_RPATH=ON Fix #2261, Nix doesn't provide mkl-sdl.pc. When we build with -DBUILD_SHARED_LIBS=ON, -DLLAMA_BLAS_VENDOR=Intel10_lp64 replace mkl-sdl.pc by mkl-dynamic-lp64-iomp.pc
2023-07-16	py : turn verify-checksum-models.py into executable (#2245)	Jiří Podivín
	README.md was adjusted to reflect the change. Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
2023-07-11	readme : fix zig build instructions (#2171)	Chad Brewbaker

2023-07-10	mpi : add support for distributed inference via MPI (#2099)	Evan Miller
	* MPI support, first cut * fix warnings, update README * fixes * wrap includes * PR comments * Update CMakeLists.txt * Add GH workflow, fix test * Add info to README * mpi : trying to move more MPI stuff into ggml-mpi (WIP) (#2099) * mpi : add names for layer inputs + prep ggml_mpi_graph_compute() * mpi : move all MPI logic into ggml-mpi Not tested yet * mpi : various fixes - communication now works but results are wrong * mpi : fix output tensor after MPI compute (still not working) * mpi : fix inference * mpi : minor * Add OpenMPI to GH action * [mpi] continue-on-error: true * mpi : fix after master merge * [mpi] Link MPI C++ libraries to fix OpenMPI * tests : fix new llama_backend API * [mpi] use MPI_INT32_T * mpi : factor out recv / send in functions and reuse * mpi : extend API to allow usage with outer backends (e.g. Metal) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-09	readme : update Termux instructions (#2147)	JackJollimore
	The file pathing is significant when running models inside of Termux on Android devices. llama.cpp performance is improved with loading a .bin from the $HOME directory.
2023-07-09	readme : add more docs indexes (#2127)	rankaiyx
	* Update README.md to add more docs indexes * Update README.md to add more docs indexes
2023-07-07	docker : add support for CUDA in docker (#1461)	dylan
	Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-06	convert : update for baichuan (#2081)	Judd
	1. guess n_layers; 2. relax warnings on context size; 3. add a note that its derivations are also supported. Co-authored-by: Judd <foldl@boxvest.com>
2023-07-05	Quantized dot products for CUDA mul mat vec (#2067)	Johannes Gäßler

2023-07-04	readme : add link web chat PR	Georgi Gerganov

2023-07-01	convert : add support of baichuan-7b (#2055)	Judd
	Co-authored-by: Judd <foldl@boxvest.com>
2023-06-26	readme : add Scala 3 bindings repo (#2010)	Roman Parykin

2023-06-26	readme : LD_LIBRARY_PATH complement for some Android devices when building ↵	Gustavo Rocha Dias
	with CLBlast inside Termux (#2007) * docs - Alternative way to build at Android, with CLBlast. * doc - LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux. * doc- fix typo
2023-06-26	readme : add link to new k-quants for visibility	Georgi Gerganov

2023-06-25	readme : add new roadmap + manifesto	Georgi Gerganov

2023-06-25	readme : add Azure CI discussion link	Georgi Gerganov

2023-06-24	readme : fix whitespaces	Georgi Gerganov

2023-06-24	readme : fixed termux instructions (#1973)	Alberto

2023-06-23	Add OpenLLaMA instructions to the README (#1954)	eiery
	* add openllama to readme
2023-06-21	Fix typo in README.md (#1961)	Rahul Vivek Nair

2023-06-20	readme : add link to p1	Georgi Gerganov

2023-06-20	Fix typo (#1949)	Xiake Sun

2023-06-19	Convert vector to f16 for dequantize mul mat vec (#1913)	Johannes Gäßler
	* Convert vector to f16 for dmmv * compile option * Added compilation option description to README * Changed cmake CUDA_ARCHITECTURES from "OFF" to "native"
2023-06-18	readme : update Android build instructions (#1922)	Mike
	Add steps for using termux on android devices to prevent common errors.
2023-06-17	Only one CUDA stream per device for async compute (#1898)	Johannes Gäßler

2023-06-17	readme : alternative way to build for Android with CLBlast. (#1828)	Gustavo Rocha Dias

2023-06-10	doc : fix wrong address of BLIS.md (#1772)	Aisuko
	Signed-off-by: Aisuko <urakiny@gmail.com>
2023-06-07	readme : add June roadmap	Georgi Gerganov

2023-06-05	docs : add performance troubleshoot + example benchmark documentation (#1674)	Yuval Peled
	* test anchor link * test table * add benchmarks * Add performance troubleshoot & benchmark * add benchmarks * remove unneeded line --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-05	readme : fix typo (#1700)	Foul-Tarnished
	Fix a typo in a command in README.md
2023-06-04	readme : update hot topics	Georgi Gerganov

2023-06-04	llama : Metal inference (#1642)	Georgi Gerganov
	* mtl : export the LLaMA computation graph * ci : disable temporary * mtl : adapt the MNIST example as starter * mtl : no need for mtl-export tool, add cli arg for main instead * mtl : export just a small part of the graph for now to make it easier * mtl : move MSL code into separate file for easy editing * mtl : initial get_rows_q4_0 kernel * mtl : confirmed get_rows_q4_0 is working correctly * mtl : add rms_norm kernel + confirm working * mtl : add mul kernel + confirm working * mtl : initial mul_mat Q4 kernel (wrong results) * mtl : mul_mat fixes (still wrong) * mtl : another mul_mat Q4 (still does not work) * mtl : working mul_mat q4 * ggml : fix handling of "view" ops in ggml_graph_import() * mtl : add rope kernel * mtl : add reshape and transpose handling * ggml : store offset as opt arg for ggml_view_xd() operators * mtl : add cpy kernel + handle view ops * mtl : confirm f16 x f32 attention mul mat * mtl : add scale kernel * mtl : add diag_mask_inf kernel * mtl : fix soft_max kernel * ggml : update ggml_nbytes() to handle non-contiguous tensors * mtl : verify V tensor contents * mtl : add f32 -> f32 cpy kernel * mtl : add silu kernel * mtl : add non-broadcast mul kernel * mtl : full GPU inference of the computation graph * mtl : optimize rms_norm and soft_max kernels * mtl : add f16 mat x f32 vec multiplication kernel * mtl : fix bug in f16 x f32 mul mat + speed-up computation * mtl : faster mul_mat_q4_0_f32 kernel * mtl : fix kernel signature + roll inner loop * mtl : more threads for rms_norm + better timing * mtl : remove printfs from inner loop * mtl : simplify implementation * mtl : add save/load vocab to ggml file * mtl : plug Metal inference into llama.cpp (very quick-n-dirty) * mtl : make it work with main example Lots of hacks but at least now it generates text * mtl : preparing for merge * mtl : clean-up ggml mtl interface + suport scratch / inplace * mtl : remove temp / debug code * metal : final refactoring and simplification * Revert "ci : disable temporary" This reverts commit 98c267fc77fe811082f672538fc91bcfc9072d63. * metal : add comments * metal : clean-up stuff, fix typos * readme : add Metal instructions * readme : add example for main
2023-06-03	Add info about CUDA_VISIBLE_DEVICES (#1682)	Henri Vasserman

2023-05-27	Add documentation about CLBlast (#1604)	Henri Vasserman
	Installing, compiling and using.
2023-05-24	readme : add docs for chat-persistent.sh (#1568)	Evan Jones
	* readme : add docs for chat-persistent.sh * Update README.md
2023-05-20	feature : support blis and other blas implementation (#1536)	Zenix
	* feature: add blis support * feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927 * fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake * Fix typo in INTEGER Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix: blas changes on ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-20	Revert "feature : add blis and other BLAS implementation support (#1502)"	Georgi Gerganov
	This reverts commit 07e9ace0f9da424d82e75df969642522880feb92.
2023-05-20	feature : add blis and other BLAS implementation support (#1502)	Zenix
	* feature: add blis support * feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927 * fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake * Fix typo in INTEGER Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-05-19	ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508)	Georgi Gerganov
	* ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0 * llama : bump LLAMA_FILE_VERSION to 3 * cuda : update Q4 and Q8 dequantize kernels * ggml : fix AVX dot products * readme : update performance table + hot topics
2023-05-19	readme : adds WizardLM to the list of supported models (#1485)	David Kennedy

2023-05-13	readme : update Q4_0 perplexities	Georgi Gerganov
	I think these were affected by the removal of the `round` during quantization