Age | Commit message (Collapse) | Author |
|
* noavx build and test
* we don't need to remove f16c in windows
|
|
* MPI support, first cut
* fix warnings, update README
* fixes
* wrap includes
* PR comments
* Update CMakeLists.txt
* Add GH workflow, fix test
* Add info to README
* mpi : trying to move more MPI stuff into ggml-mpi (WIP) (#2099)
* mpi : add names for layer inputs + prep ggml_mpi_graph_compute()
* mpi : move all MPI logic into ggml-mpi
Not tested yet
* mpi : various fixes - communication now works but results are wrong
* mpi : fix output tensor after MPI compute (still not working)
* mpi : fix inference
* mpi : minor
* Add OpenMPI to GH action
* [mpi] continue-on-error: true
* mpi : fix after master merge
* [mpi] Link MPI C++ libraries to fix OpenMPI
* tests : fix new llama_backend API
* [mpi] use MPI_INT32_T
* mpi : factor out recv / send in functions and reuse
* mpi : extend API to allow usage with outer backends (e.g. Metal)
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
* ggml_graph_compute: deprecate using ggml_context, try resolve issue #287
* rewrite: no longer consider backward compitability; plan and make_plan
* minor: rename ctx as plan; const
* remove ggml_graph_compute from tests/test-grad0.c, but current change breaks backward
* add static ggml_graph_compute_sugar()
* minor: update comments
* reusable buffers
* ggml : more consistent naming + metal fixes
* ggml : fix docs
* tests : disable grad / opt + minor naming changes
* ggml : add ggml_graph_compute_with_ctx()
- backwards compatible API
- deduplicates a lot of copy-paste
* ci : enable test-grad0
* examples : factor out plan allocation into a helper function
* llama : factor out plan stuff into a helper function
* ci : fix env
* llama : fix duplicate symbols + refactor example benchmark
* ggml : remove obsolete assert + refactor n_tasks section
* ggml : fix indentation in switch
* llama : avoid unnecessary bool
* ggml : remove comments from source file and match order in header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
* Generalize quantize_fns for simpler FP16 handling
* Remove call to ggml_cuda_mul_mat_get_wsize
* ci : disable FMA for mac os actions
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
until they decide to fix the brew installation in the macos runners.
see the open issues. eg https://github.com/actions/runner-images/pull/7710
|
|
|
|
|
|
Set `LLAMA_BUILD_SERVER` in workflow so the `server` example gets build. This currently only applies to Windows builds because it seems like only Windows binary artifacts are included in releases.
Add `server` example target to `Makefile` (still uses `LLAMA_BUILD_SERVER` define and does not build by default)
Fix issue where `vdot` binary wasn't removed when running `make clean`.
Fix compile warnings in `server` example.
Add `.hpp` files to trigger workflow (the server example has one).
|
|
* Fix OpenBLAS build
* Fix `LLAMA_BLAS_VENDOR` CMake variable that should be a string and not a boolean.
|
|
|
|
* Update CLBlast to 1.6.0
|
|
* feature: add blis support
* feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927
* fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake
* Fix typo in INTEGER
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Fix: blas changes on ci
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
* Add OpenCL and CLBlast support
* Add OpenBLAS support
* Remove testing from matrix
* change build name to 'clblast'
|
|
|
|
|
|
|
|
* Fix build under Windows when enable BUILD_SHARED_LIBS
* Make AVX512 test on Windows to build the shared libs
|
|
CI (#1074)
[Accelerate](https://developer.apple.com/documentation/accelerate) is an Apple framework which can only be used on macOS, and the CMake build [ignores](https://github.com/ggerganov/llama.cpp/blob/master/CMakeLists.txt#L102) the `LLAMA_ACCELERATE` variable when run on non-Apple platforms. This implies setting `LLAMA_ACCELERATE` is a no-op on Ubuntu and can be removed.
This will reduce visual noise in CI check results (in addition to reducing the number of checks we have to run for every PR). Right now every sanitized build is duplicated twice for no good reason (e.g., we have `CI / ubuntu-latest-cmake-sanitizer (ADDRESS, Debug, ON)` and `CI / ubuntu-latest-cmake-sanitizer (ADDRESS, Debug, OFF)`).
|
|
|
|
|
|
* CI: Re-enable AVX512 testing (Windows-MSVC)
Now with 100% less base64 encoding
* plain __cpuid is enough here
|
|
- Changes in subdirectories will now be detecter properly
- (Windows-MSVC) AVX512 tests temporarily disabled
|
|
|
|
|
|
* Add support for linux/arm64 platform
* Add platform to versioned builds
|
|
applies only to sanitizer builds so they wont be cancelled
|
|
* CMake: Add AVX512 option
* CI: Add AVX/AVX512 builds (Windows)
(AVX512 tests can only be run when the worker happens to support it, building works anyway)
* CMake: Fix sanitizer linkage ( merged #468 )
* CI: Add sanitizer builds (Ubuntu)
* CI: Fix release tagging
(change @zendesk/action-create-release to @anzz1/action-create-release until upstream PR Added commitish as input zendesk/action-create-release#32 is merged)
|
|
* CI: Separate Build and Test steps (CMake)
* CI: Make sure build passes before running tests (CMake)
* CI: Standardise step id names
|
|
* Deduplicate q4 quantization functions
* Use const; add basic test
* Re-enable quantization test
* Disable AVX2 flags in CI
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
|
|
* Add test-tokenizer-0 to do a few tokenizations - feel free to expand
* Added option to convert-pth-to-ggml.py script to dump just the vocabulary
* Added ./models/ggml-vocab.bin containing just LLaMA vocab data (used for tests)
* Added utility to load vocabulary file from previous point (temporary implementation)
* Avoid using std::string_view and drop back to C++11 (hope I didn't break something)
* Rename gpt_vocab -> llama_vocab
* All CMake binaries go into ./bin/ now
|
|
* fix publish permission
* try to fix docker pipeline using as password github_token & username repository_owner
|
|
* CI Improvements
Manual build feature, autoreleases for Windows
* better CI naming convention
use branch name in releases and tags
|
|
* add ptread link to fix cmake build under linux
* add cmake to linux and macos platform
* separate make and cmake workflow
---------
Co-authored-by: Sebastián A <sebastian.aedo29@gmail.com>
|
|
* feat: dockerize llamacpp
* feat: split build & runtime stages
* split dockerfile into main & tools
* add quantize into tool docker image
* Update .devops/tools.sh
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add docker action pipeline
* change CI to publish at github docker registry
* fix name runs-on macOS-latest is macos-latest (lowercase)
* include docker versioned images
* fix github action docker
* fix docker.yml
* feat: include all-in-one command tool & update readme.md
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
|
|
|
|