Age | Commit message (Collapse) | Author | |
---|---|---|---|
2023-03-28 | py : add capabiliy to convert from ggml back to torch or hf format for ↵ | Tai Duc Nguyen | |
further consumption/training/finetuning (#403) | |||
2023-03-28 | ggml : refactor quantized processing functions (#509) | Stephan Walter | |
* Refactor quantized processing functions * ggml : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> | |||
2023-03-28 | py : removed unused `model` variable and verified that the code functions ↵ | DooWoong Lee (David) | |
correctly with `vocab_only` setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547) | |||
2023-03-28 | ci : make ctest verbose, hopefully we see what is wrong with the sanitizer | Georgi Gerganov | |
2023-03-28 | tests : free llama context at the end of the test | Georgi Gerganov | |
2023-03-28 | all : be more strict about converting float to double (#458) | Stephan Walter | |
* Be more strict about converting float to double * Test equivalence of round, SILU implementations Test module is commented out in CMakeLists.txt because the tests may take a long time, depending on how much the compiler optimizes. * Fix softmax in perplexity.cpp * all : prefer float over double where appropriate * perplexity : add <cmath> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> | |||
2023-03-28 | deploy : add a Package.swift for SwiftPM support (#393) | Jed Fox | |
* Add a Package.swift for SwiftPM support * Swap from exclusions to allowlist | |||
2023-03-28 | ggml : introduce structs for the q4 data blocks (#356) | Stephan Walter | |
* Introduce structs for the q4 data blocks * ggml : rename quant struct variables + fix ARM_NEON --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> | |||
2023-03-28 | gitignore : add "embedding" | Georgi Gerganov | |
2023-03-28 | Check the existence of f16_model_path_base in quantize.py (#574) | dotpy314 | |
Co-authored-by: Jincheng Miao <jincheng.miao@gmail.com> | |||
2023-03-28 | Fix usage of F16C intrinsics in AVX code (#563) | slaren | |
* Fix usage of F16C intrinsics in AVX code when F16C is not defined | |||
2023-03-28 | main.cpp fixes, refactoring (#571) | anzz1 | |
- main: entering empty line passes back control without new input in interactive/instruct modes - instruct mode: keep prompt fix - instruct mode: duplicate instruct prompt fix - refactor: move common console code from main->common | |||
2023-03-28 | Add embedding example to Makefile (#540) | RJ Adriaansen | |
2023-03-27 | Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542) | Marco Matthies | |
2023-03-26 | ci: add debug build to sanitizer build matrix (#527) | Erik Scholz | |
2023-03-26 | Fix undefined variables in debug build, remove unused variables (#531) | Stephan Walter | |
2023-03-26 | Add support for linux/arm64 platform during Docker Builds (#514) | Juan Calderon-Perez | |
* Add support for linux/arm64 platform * Add platform to versioned builds | |||
2023-03-26 | Update README and comments for standalone perplexity tool (#525) | Stephan Walter | |
2023-03-26 | [main] fix infinite generation (-n == -1) (#523) | anzz1 | |
2023-03-26 | Add logo to README.md | Georgi Gerganov | |
2023-03-26 | Exit from interactive mode if input stream is bad (#491) | Harald Fernengel | |
Allow exiting the interactive prompt also with CTRL-D on Unix and CTRL-Z on Windows. | |||
2023-03-26 | CI: Run other sanitizer builds even if one fails (#511) | anzz1 | |
applies only to sanitizer builds so they wont be cancelled | |||
2023-03-25 | Clarify console output in convert-pth-to-ggml.py (#512) | jp-x-g | |
"Processing part 1 of 3" instead of "Processing part 0" | |||
2023-03-25 | CMake / CI additions (#497) | anzz1 | |
* CMake: Add AVX512 option * CI: Add AVX/AVX512 builds (Windows) (AVX512 tests can only be run when the worker happens to support it, building works anyway) * CMake: Fix sanitizer linkage ( merged #468 ) * CI: Add sanitizer builds (Ubuntu) * CI: Fix release tagging (change @zendesk/action-create-release to @anzz1/action-create-release until upstream PR Added commitish as input zendesk/action-create-release#32 is merged) | |||
2023-03-25 | (Windows) Set console to UTF-8 on init (#420) | anzz1 | |
Sets console codepage to 65001 (CP_UTF8) on start for both input and output, should fix problems with UTF-8 characters. | |||
2023-03-25 | Fix colors enabling on WIN32 | Georgi Gerganov | |
2023-03-25 | If n_predict == -1, generate forever | Georgi Gerganov | |
2023-03-25 | Inifinite generation via context swapping (#71) | Georgi Gerganov | |
2023-03-25 | Cleanup STL headers + fix embedding examples + minor stuff | Georgi Gerganov | |
2023-03-25 | Move chat scripts into "./examples" | Georgi Gerganov | |
2023-03-25 | Add AVX2 implementation of dequantize_row_q4_1 (#505) | slaren | |
2023-03-25 | Overhaul the examples structure | Georgi Gerganov | |
- main -> examples - utils -> examples (renamed to "common") - quantize -> examples - separate tools for "perplexity" and "embedding" Hope I didn't break something ! | |||
2023-03-25 | Retire the ggml_mul_mat() branch for transposed src0 (#500) | Georgi Gerganov | |
* Retire the ggml_mul_mat() for transposed src0 - It can always be made contiguous with ggml_cpy() - The code is now simplified - The results are deterministic in respect to num threads * SIMD-ify dequantize_row_q4_0() for ARM_NEON (#502) * Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON * Fix dequantization - forgot to interleave the quants | |||
2023-03-25 | Disable prompt verbosity by default and add option to enable (#480) | Georgi Gerganov | |
2023-03-25 | Add AVX2 implementation of dequantize_row_q4_0 (#467) | slaren | |
2023-03-25 | Don't interefe with BLAS for large prompts by running only 1 thread | Georgi Gerganov | |
2023-03-25 | Add longer DAN prompt for testing big batch numbers | Georgi Gerganov | |
2023-03-25 | Add timings for the prompt evaluation (#478) | slaren | |
2023-03-25 | Remove obsolete information from README | Georgi Gerganov | |
2023-03-25 | Remove obsolete assert and fix compiler warning | Georgi Gerganov | |
2023-03-25 | Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLAS | Georgi Gerganov | |
2023-03-25 | bounds checking for input prefix (#492) | anzz1 | |
2023-03-25 | feat: '--in-prefix STRING' option (#426) | anzz1 | |
Prefix user inputs with a string | |||
2023-03-25 | Add support for file load progress reporting callbacks (#434) | Jed Fox | |
* File load progress reporting * Move llama_progress_handler into llama_context_params * Renames * Use seekg to find file size instead * More correct load progress * Call progress callback more frequently * Fix typo | |||
2023-03-25 | Add missing struct annotation (#483) | Doomsdayrs | |
`llama_sample_top_p_top_k` was missing the struct annotation on line 126. This causes a compiler issue when being parsed by the Kotlin C interop generator. This commit fixes the above issue by adding the struct annotation. | |||
2023-03-25 | Fix crash for 65B model with pre-allocated memory (#485) | Chris Kuehl | |
2023-03-24 | Disable BLAS altogether - the bug is not just for qunatized mat mul | Georgi Gerganov | |
2023-03-24 | Disable BLAS branch in mul_mat - seems there is a bug | Georgi Gerganov | |
2023-03-24 | Immediately start processing the prompt before user input has been provided ↵ | Georgi Gerganov | |
(#476) | |||
2023-03-24 | Reduce memory usage and allocate enough memory for largest context (#473) | Georgi Gerganov | |
* Reduce memory usage and allocate enough memory for large contexts * Simpler scratch buffer usage * Reenable BLAS for quantized mul_mat * Fix number of layers in 30B and 65B * Fix KV cache size for F32 |