Age | Commit message (Collapse) | Author |
|
I think these were affected by the removal of the `round` during quantization
|
|
|
|
* Fix OpenCL kernels for the new formats
* Fix Q5_0 alignment issues.
|
|
|
|
|
|
|
|
|
|
* ggml : remove Q4_0 bit shufling (ARM NEON)
* ggml : remove Q4_1 bit shuffling (ARM NEON + reference)
* ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON)
* ggml : remove Q4_2 bit shuffling (WIP, BROKEN)
* ggml : remove Q5_0 bit shuffling (ARM NEON)
* ggml : 2x faster scalar implementations
* ggml : remove Q5_1 bit shuffling (ARM NEON + scalar)
* ggml : simplify scalar dot
* ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit
* ggml : fix Q4_1 quantization
* ggml : update cuBLAS + normalize variable names
* ggml : remove Q4_2 mode
* ggml : minor formatting
* ggml : fix Q5_0 quantization
* scripts : add script for measuring the time per token
* AVX implementations (#1370)
* ggml : uniform 5th bit extraction
* llama : produce error upon loading old model files
* llama : fix model magic/version write
* ggml : speed-up Q5_0 + Q5_1 at 4 threads
* ggml : preserve old Q4 and Q5 formats
* ggml : simplify Q8_1 - no need for low / high sums anymore
* ggml : fix Q8_0 and Q8_1 rounding
* Revert "AVX implementations (#1370)"
This reverts commit 948d124837f9d287d8490f41338e0e4cceb0814f.
* ggml : fix AVX2 implementation
* sha : update hashes for 7B and 13B
* readme : update timings + remove warning banner
* llama : update v2 PR number to 1405
* ggml : fix WASM comments
* ggml : back to original bit order
* readme : add note that Q4 and Q5 have been changed
* llama : fix return for unknown version
---------
Co-authored-by: Stephan Walter <stephan@walter.name>
|
|
* add model-agnostic dan prompt
* quick readme update
* save a token
* Revert "quick readme update"
This reverts commit 8dc342c069cbdca8ce63ad974becec6fc844e1e4.
|
|
* main : add option to save full output to session
* split behavior into --session and --prompt-cache
* restore original implementation with new names
* PR comments
* move the check for incompatible parameters to gpt_params_parse
* Fix whitespace
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
---------
Co-authored-by: DannyDaemonic <DannyDaemonic@gmail.com>
|
|
|
|
* use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler
Tested with a 13B model.
* use _mm_pause() in busyloop
* use _mm_pause() in busyloop on x86_64 to reduce power consumption
|
|
(#1040)
* Interface improvements
* Multiline input
* Track character width
* Works with all characters and control codes + Windows console fixes
|
|
|
|
|
|
fixes #1363
|
|
* llama : require first token to be BOS
* scripts : add ppl-run-all.sh
* perplexity : add BOS for each chunk
* readme : update perplexity values after BOS fix
* perplexity : add clarifying comments
|
|
* when loading a safetensors file, ignore the metadata header
* check for safetensors files first, and only use PyTorch versions when safetensors aren't available
|
|
|
|
* Add OpenCL and CLBlast support
* Add OpenBLAS support
* Remove testing from matrix
* change build name to 'clblast'
|
|
Minor edit in ggml.c which originally would prevent OpenCL from loading completely if GGML_USE_ACCELERATE was defined.
Minor speedup in prompt eval time.
|
|
|
|
This commit is a port of a detection method used in koboldcpp's Makefile in order to automatically set the -lcblas option on Arch Linux
|
|
|
|
|
|
|
|
* Line 698 has one #staticmethod and should not
otherwise throw error at unpickle.load() as not callable
* Update convert.py
---------
Co-authored-by: Ivan Stepanov <ivanstepanovftw@gmail.com>
|
|
(#1301)
|
|
|
|
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
|
|
|
|
* adding --in-suffix option
* print input suffix before generation
|
|
* change immintrin.h to intrin.h for compatibility
Building on windows11 arm throws an error on this line. Seems like using intrin.h covers x86 and and arm
* conditional def of intrin.h
* fix typo in ggml.c
|
|
|
|
|
|
* fix reverse prompt and multi line
* Code Formatting
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
https://github.com/ggerganov/ggml/pull/127#issuecomment-1533648531
|
|
|
|
|
|
|
|
* python script to verify the checksum of the llama models
Added Python script for verifying SHA256 checksums of files in a directory, which can run on multiple platforms. Improved the formatting of the output results for better readability.
* Update README.md
update to the readme for improved readability and to explain the usage of the python checksum verification script
* update the verification script
I've extended the script based on suggestions by @prusnak
The script now checks the available RAM, is there is enough to check the file at once it will do so. If not the file is read in chunks.
* minor improvment
small change so that the available ram is checked and not the total ram
* remove the part of the code that reads the file at once if enough ram is available
based on suggestions from @prusnak i removed the part of the code that checks whether the user had enough ram to read the entire model at once. the file is now always read in chunks.
* Update verify-checksum-models.py
quick fix to pass the git check
|
|
* fix dan.txt
* miku prompt improvements
* use common characters
|
|
* llama : only copy used KV cache in get / set state
* switch to ggml for copying k, v
* avoid designated initializers
|
|
|
|
|
|
|
|
* make git build info work with submodules
---------
Co-authored-by: Green Sky <green@g-s.xyz>
|
|
|
|
Signed-off-by: deadprogram <ron@hybridgroup.com>
|
|
|