Age | Commit message (Collapse) | Author |
|
ggml-ci
|
|
ggml-ci
|
|
|
|
* metal : fix out-of-bounds access + style changes
* metal : increase concurrency nodes to 2*GGML_MAX_NODES
|
|
|
|
* zig build fixes
* Disable LTO on Windows.
|
|
persistence (#2521)
|
|
|
|
|
|
|
|
|
|
|
|
Fixes #2503
|
|
|
|
* Fixing race condition in server.cpp and partial stream handling in completion.js
* Reverting assert edits.
* Adding newline to eof
|
|
upfront (#2488)
* added stream saving context data to file to avoid allocating unnecessary amounts of memory
* generalised copying state data to file or buffer
* added comments explaining how copy_state_data works
* fixed trailing whitespaces
* fixed save load state example
* updated save load state to use public function in llama.cpp
* - restored breakage of the llama_copy_state_data API
- moved new logic for copying llama state data to internal function
* fixed function declaration order
* restored save load state example
* fixed whitepace
* removed unused llama-util.h include
* Apply suggestions from code review
Co-authored-by: slaren <slarengh@gmail.com>
* Apply code review suggestions
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
|
|
|
|
* examples : add JSON schema grammars
* complete JSON grammar
* ensure primitive types can be used as root of schema
* support integer type and adjust usage text
|
|
|
|
|
|
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com>
* Add Aquila-7B models in README.md
Signed-off-by: ldwang <ftgreat@gmail.com>
* Up Aquila-7B models in README.md
Signed-off-by: ldwang <ftgreat@gmail.com>
---------
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
|
|
* fix hellaswag print format, cast away warning in test-double-float
* c++11 cannot use designated initializers
* add static to test-grad0.c internal functions
* use memcpy in test-double-float.c
* port c tests to c++
* use initializer list for ggml_init_params
|
|
* add support for chinese llama-2 / alpaca-2
* remove white spaces
|
|
|
|
* server : Support dark mode
So it respects user system light / dark settings.
* Update index.html.hpp by running ./deps.sh
|
|
* Added gqa8 kernel to allow llama-2-70B on metal
* Update ggml-metal.m
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast
* Added ne03==ne13 assertion
---------
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
* fix Metal backend broken from the allocator changes
|
|
* ggml : add graph tensor allocator
* ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset
* ggml : refactor ggml_view_Nd into ggml_view_tensor_offset
|
|
* mmq implementation for non k-quants
* q6_K
* q2_K
* q3_k
* q4_K
* vdr
* q5_K
* faster q8_1 loading
* loop unrolling
* add __restrict__
* q2_K sc_high
* GGML_CUDA_MMQ_Y
* Updated Makefile
* Update Makefile
* DMMV_F16 -> F16
* Updated README, CMakeLists
* Fix CMakeLists.txt
* Fix CMakeLists.txt
* Fix multi GPU out-of-bounds
|
|
|
|
* common.h : add hellaswag / remove perplexity-lines
* common.cpp : add hellaswag / remove perplexity-lines
* perplexity.cpp : add hellswag scores / remove perplexity-lines
* perplexity.cpp : clean up
* common.h : change default param value
* common.cpp : Change default param
* perplexity.cpp : alter wording
* common.h : alter wording
* common.cpp : alter wording
|
|
|
|
* supporting more diverse tokenizers
* Update llama.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
* add: server chat mode with llama2
* fix: remove the unnecessary last \n
|
|
|
|
|
|
* Obtaining LLaMA 2 instructions
* Removed sharing warning for LLaMA 2
* Linked TheBloke's GGML repos
* Add LLaMA 2 to list of supported models
* Added LLaMA 2 usage instructions
* Added links to LLaMA 2 70B models
|
|
* convert.py : fix llama 2 70b conversion from Huggingface
|
|
|
|
|
|
|
|
* ggml : graph allocation in contexts
* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx
* llama.cpp : allocate graph in the context
* add GGML_PAD
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
|
|
* ggml : fix ggml_flash_attn to use op_params
|