From eb34620aeceaf9d9df7fcb19acc17ad41b9f60f8 Mon Sep 17 00:00:00 2001 From: Georgi Gerganov Date: Tue, 21 Mar 2023 17:29:41 +0200 Subject: Add tokenizer test + revert to C++11 (#355) * Add test-tokenizer-0 to do a few tokenizations - feel free to expand * Added option to convert-pth-to-ggml.py script to dump just the vocabulary * Added ./models/ggml-vocab.bin containing just LLaMA vocab data (used for tests) * Added utility to load vocabulary file from previous point (temporary implementation) * Avoid using std::string_view and drop back to C++11 (hope I didn't break something) * Rename gpt_vocab -> llama_vocab * All CMake binaries go into ./bin/ now --- .github/workflows/build.yml | 3 +++ 1 file changed, 3 insertions(+) (limited to '.github/workflows') diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 9c1de58..5b1b5dd 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -54,6 +54,7 @@ jobs: cd build cmake .. cmake --build . --config Release + ctest --output-on-failure macOS-latest-make: runs-on: macos-latest @@ -90,6 +91,7 @@ jobs: cd build cmake .. cmake --build . --config Release + ctest --output-on-failure windows-latest-cmake: runs-on: windows-latest @@ -106,6 +108,7 @@ jobs: cd build cmake .. cmake --build . --config Release + ctest --output-on-failure - name: Get commit hash id: commit -- cgit v1.2.3