llama.cpp.git - llama.cpp

Age	Commit message (Collapse)	Author
2023-06-06	Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703)	Johannes Gäßler
	* CUDA multi GPU + scratch ggml_cuda_compute_forward Tensor parallelism ggml_cuda_add ggml_cuda_rms_norm ggml_cuda_silu CUDA scratch buffer --main-gpu CLI option
2023-06-06	Clblast fixes + enhancements to save VRAM and offload more layers (#1675)	LostRuins
	* Use events instead of clFinish, where possible * OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel * Reduce queueing overhead for contiguous tensors by using single mul kernel call * Adapt to #1612 cl_mem malloc changes * Reduce code duplication between cuda and opencl branches * Improve implementation * Clblast fixes + enhancements to save VRAM: 1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them. 2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer 3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it. * change max value size_t to use limits * removed flags from the CL pool malloc, apply code tidying suggestions.
2023-06-04	OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653)	0cc4m
	* Use events instead of clFinish, where possible * OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel * Reduce queueing overhead for contiguous tensors by using single mul kernel call * Adapt to #1612 cl_mem malloc changes * Reduce code duplication between cuda and opencl branches * Improve implementation
2023-05-28	opencl : no need to allocate cl_mem on heap (#1612)	Howard Su

2023-05-28	opencl : use strstr to check if fp16 supported (#1611)	Howard Su
	* Use strstr to check if fp16 supported * Ensure ext_buffer is null terminated
2023-05-23	Fix handling of "invalid property" when creating OpenCL command queue (#1565)	Maarten ter Huurne
	The `clCreateCommandQueue()` function will return the code `CL_INVALID_QUEUE_PROPERTIES` when passed unsupported properties, not `CL_INVALID_PROPERTY` as the original code was checking for.
2023-05-23	OpenCL Token Generation Acceleration (#1459)	0cc4m
	* Move back to C++ for OpenCL * Refactor OpenCL code to work more like the CUDA code, add missing functions * Deduplicate dequant kernels * Add OpenCL compile options * Use compile args for preprocessing constants * Restore default platform + device selection by id behavior --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: Henri Vasserman <henv@hot.ee>