Introduce C-style API (#370)

* Major refactoring - introduce C-style API * Clean up * Add <cassert> * Add <iterator> * Add <algorithm> .... * Fix timing reporting and accumulation * Measure eval time only for single-token calls * Change llama_tokenize return meaning
author: Georgi Gerganov <ggerganov@gmail.com> 2023-03-22 07:32:36 +0200
committer: GitHub <noreply@github.com> 2023-03-22 07:32:36 +0200
commit: f5a77a629bd0f37ae1696747633ab42a5530ec15 (patch)
tree: b3d147dd228ce67661ed497a6dc61b444a38e0f9 /convert-pth-to-ggml.py
parent: da0e9fe90ccf6e73597eb19dd0cfc0a28363fb3b (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/convert-pth-to-ggml.py b/convert-pth-to-ggml.py
index db5b00f..f0f6b0e 100644
--- a/convert-pth-to-ggml.py
+++ b/convert-pth-to-ggml.py
@@ -148,7 +148,7 @@ def main():
         model = torch.load(fname_model, map_location="cpu")
 
         with open(fname_out, "wb") as fout:
-            fout.write(struct.pack("i", hparams["vocab_size"]))
+            write_header(fout, hparams, ftype)
             write_tokens(fout, tokenizer)
 
         del model
author	Georgi Gerganov <ggerganov@gmail.com>	2023-03-22 07:32:36 +0200
committer	GitHub <noreply@github.com>	2023-03-22 07:32:36 +0200
commit	f5a77a629bd0f37ae1696747633ab42a5530ec15 (patch)
tree	b3d147dd228ce67661ed497a6dc61b444a38e0f9 /convert-pth-to-ggml.py
parent	da0e9fe90ccf6e73597eb19dd0cfc0a28363fb3b (diff)