diff options
author | 0cc4m <picard12@live.de> | 2023-06-04 08:12:05 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-06-04 08:12:05 +0200 |
commit | dcb2ed48268e421baf25adc00d602dad0f415564 (patch) | |
tree | 261ef84fe660d06fce90c58fc01a16ae0e69be52 /ggml.h | |
parent | d8bd0013e8768aaa3dc9cfc1ff01499419d5348e (diff) |
OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653)
* Use events instead of clFinish, where possible
* OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel
* Reduce queueing overhead for contiguous tensors by using single mul kernel call
* Adapt to #1612 cl_mem malloc changes
* Reduce code duplication between cuda and opencl branches
* Improve implementation
Diffstat (limited to 'ggml.h')
0 files changed, 0 insertions, 0 deletions