metal : concurrently dispatch commands (#2358)

* metal: concurrently dispatch commands Function `ggml_metal_graph_find_concurrency` will run and write commands that can be issued concurrently to metal context `concur_list` array, when `ggml_metal_graph_compute` is called for the first time. * metal: don't call find_concurrency automatically. * metal : code style changes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
author: Shouzheng Liu <lshzh.hi@gmail.com> 2023-07-25 08:00:19 -0400
committer: GitHub <noreply@github.com> 2023-07-25 15:00:19 +0300
commit: 1aa18ef994a6a2b531434eb13251ef48e56d345b (patch)
tree: 7ce76e5926ae0a6a48db56590f69873aca8dd917 /llama.cpp
parent: 9a08eaf3c4010962d0126e9e5bfbe9af64b2ac90 (diff)
1 files changed, 3 insertions, 0 deletions
diff --git a/llama.cpp b/llama.cpp
index b42b410..2d737bb 100644
--- a/llama.cpp
+++ b/llama.cpp
@@ -1720,6 +1720,9 @@ static bool llama_eval_internal(
 
 #ifdef GGML_USE_METAL
     if (lctx.ctx_metal && N == 1) {
+        if (!ggml_metal_if_optimized(lctx.ctx_metal)) {
+            ggml_metal_graph_find_concurrency(lctx.ctx_metal,&gf);
+        }
         ggml_metal_set_n_cb     (lctx.ctx_metal, n_threads);
         ggml_metal_graph_compute(lctx.ctx_metal, &gf);
         ggml_metal_get_tensor   (lctx.ctx_metal, cur);
author	Shouzheng Liu <lshzh.hi@gmail.com>	2023-07-25 08:00:19 -0400
committer	GitHub <noreply@github.com>	2023-07-25 15:00:19 +0300
commit	1aa18ef994a6a2b531434eb13251ef48e56d345b (patch)
tree	7ce76e5926ae0a6a48db56590f69873aca8dd917 /llama.cpp
parent	9a08eaf3c4010962d0126e9e5bfbe9af64b2ac90 (diff)