aboutsummaryrefslogtreecommitdiff
path: root/ggml.h
diff options
context:
space:
mode:
authorLostRuins <39025047+LostRuins@users.noreply.github.com>2023-06-07 01:00:01 +0800
committerGitHub <noreply@github.com>2023-06-06 19:00:01 +0200
commitd5b111f53d14972669eb52055f9df2567663ad8b (patch)
tree1cbe5fb656fc519daf9b18904772318c21b81254 /ggml.h
parent2d43387dafe9c60f15f57aa23ee0b37864b98b32 (diff)
Clblast fixes + enhancements to save VRAM and offload more layers (#1675)
* Use events instead of clFinish, where possible * OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel * Reduce queueing overhead for contiguous tensors by using single mul kernel call * Adapt to #1612 cl_mem malloc changes * Reduce code duplication between cuda and opencl branches * Improve implementation * Clblast fixes + enhancements to save VRAM: 1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them. 2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer 3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it. * change max value size_t to use limits * removed flags from the CL pool malloc, apply code tidying suggestions.
Diffstat (limited to 'ggml.h')
0 files changed, 0 insertions, 0 deletions