llama.cpp.git - llama.cpp

Age	Commit message (Collapse)	Author
2023-06-04	OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653)	0cc4m
	* Use events instead of clFinish, where possible * OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel * Reduce queueing overhead for contiguous tensors by using single mul kernel call * Adapt to #1612 cl_mem malloc changes * Reduce code duplication between cuda and opencl branches * Improve implementation
2023-05-28	opencl : no need to allocate cl_mem on heap (#1612)	Howard Su

2023-05-28	opencl : use strstr to check if fp16 supported (#1611)	Howard Su
	* Use strstr to check if fp16 supported * Ensure ext_buffer is null terminated
2023-05-23	Fix handling of "invalid property" when creating OpenCL command queue (#1565)	Maarten ter Huurne
	The `clCreateCommandQueue()` function will return the code `CL_INVALID_QUEUE_PROPERTIES` when passed unsupported properties, not `CL_INVALID_PROPERTY` as the original code was checking for.
2023-05-23	OpenCL Token Generation Acceleration (#1459)	0cc4m
	* Move back to C++ for OpenCL * Refactor OpenCL code to work more like the CUDA code, add missing functions * Deduplicate dequant kernels * Add OpenCL compile options * Use compile args for preprocessing constants * Restore default platform + device selection by id behavior --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: Henri Vasserman <henv@hot.ee>