metal : concurrently dispatch commands (#2358)

* metal: concurrently dispatch commands Function `ggml_metal_graph_find_concurrency` will run and write commands that can be issued concurrently to metal context `concur_list` array, when `ggml_metal_graph_compute` is called for the first time. * metal: don't call find_concurrency automatically. * metal : code style changes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
author: Shouzheng Liu <lshzh.hi@gmail.com> 2023-07-25 08:00:19 -0400
committer: GitHub <noreply@github.com> 2023-07-25 15:00:19 +0300
commit: 1aa18ef994a6a2b531434eb13251ef48e56d345b (patch)
tree: 7ce76e5926ae0a6a48db56590f69873aca8dd917 /ggml-metal.h
parent: 9a08eaf3c4010962d0126e9e5bfbe9af64b2ac90 (diff)
1 files changed, 7 insertions, 0 deletions
diff --git a/ggml-metal.h b/ggml-metal.h
index 928f170..16f1a0c 100644
--- a/ggml-metal.h
+++ b/ggml-metal.h
@@ -61,6 +61,13 @@ void ggml_metal_set_tensor(struct ggml_metal_context * ctx, struct ggml_tensor *
 // get data from the device into host memory
 void ggml_metal_get_tensor(struct ggml_metal_context * ctx, struct ggml_tensor * t);
 
+// try to find operations that can be run concurrently in the graph
+// you should run it again if the topology of your graph changes
+void ggml_metal_graph_find_concurrency(struct ggml_metal_context * ctx, struct ggml_cgraph * gf);
+
+// if the graph has been optimized for concurrently dispatch
+bool ggml_metal_if_optimized(struct ggml_metal_context * ctx);
+
 // same as ggml_graph_compute but uses Metal
 // creates gf->n_threads command buffers in parallel
 void ggml_metal_graph_compute(struct ggml_metal_context * ctx, struct ggml_cgraph * gf);
author	Shouzheng Liu <lshzh.hi@gmail.com>	2023-07-25 08:00:19 -0400
committer	GitHub <noreply@github.com>	2023-07-25 15:00:19 +0300
commit	1aa18ef994a6a2b531434eb13251ef48e56d345b (patch)
tree	7ce76e5926ae0a6a48db56590f69873aca8dd917 /ggml-metal.h
parent	9a08eaf3c4010962d0126e9e5bfbe9af64b2ac90 (diff)