llama.cpp.git - llama.cpp

diff options

author	LostRuins <39025047+LostRuins@users.noreply.github.com>	2023-06-07 01:00:01 +0800
committer	GitHub <noreply@github.com>	2023-06-06 19:00:01 +0200
commit	d5b111f53d14972669eb52055f9df2567663ad8b (patch)
tree	1cbe5fb656fc519daf9b18904772318c21b81254 /ggml.h
parent	2d43387dafe9c60f15f57aa23ee0b37864b98b32 (diff)

Clblast fixes + enhancements to save VRAM and offload more layers (#1675)

* Use events instead of clFinish, where possible * OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel * Reduce queueing overhead for contiguous tensors by using single mul kernel call * Adapt to #1612 cl_mem malloc changes * Reduce code duplication between cuda and opencl branches * Improve implementation * Clblast fixes + enhancements to save VRAM: 1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them. 2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer 3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it. * change max value size_t to use limits * removed flags from the CL pool malloc, apply code tidying suggestions.

Diffstat (limited to 'ggml.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: