Age | Commit message (Collapse) | Author |
|
* Add AVX2 version of ggml_vec_dot_q4_1
* Small optimisations to q4_1 dot product (@Const-me)
* Rearrange Q4_1 quantization to work for multipart models. (Fix #152)
* Fix ggml_vec_mad_q4_1 too
* Fix non-vectorised q4_1 vec mul
|
|
|
|
* add ggml_rms_norm
* update op num
|
|
Without "static" prefix, it fails to compile in clang
|
|
* Don't use vdotq_s32 if it's not available
`dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available.
Reintroduces the code removed in 84d9015 if `__ARM_FEATURE_DOTPROD` isn't defined.
* Update ggml.c
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
* 10% performance boost on ARM
* Back to original change
|
|
This reverts commit 113a9e83ebc0f788f861394437087bf3ca0e019b.
There are some reports for illegal instruction.
Moved this stuff to vdotq_s32 branch until resolve
|
|
|
|
|
|
|
|
* Apply fixes suggested to build on windows
Issue: https://github.com/ggerganov/llama.cpp/issues/22
* Remove unsupported VLAs
* MSVC: Remove features that are only available on MSVC C++20.
* Fix zero initialization of the other fields.
* Change the use of vector for stack allocations.
|
|
|
|
|
|
|