docker : add support for CUDA in docker (#1461)

Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
author: dylan <canardleteer@users.noreply.github.com> 2023-07-07 11:25:25 -0700
committer: GitHub <noreply@github.com> 2023-07-07 21:25:25 +0300
commit: 84525e7962bee0abef91108948bbf7f7bfdcf421 (patch)
tree: e2e732cb057398249b7b98f46d57d3208a10a2e5 /README.md
parent: a7e20edf2266169ccd97a4eb949a593d628fbd64 (diff)
1 files changed, 32 insertions, 0 deletions
diff --git a/README.md b/README.md
index 863aef1..7953fd3 100644
--- a/README.md
+++ b/README.md
@@ -731,6 +731,38 @@ or with a light image:
 docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
 ```
 
+### Docker With CUDA
+
+Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
+
+#### Building Locally
+
+```bash
+docker build -t local/llama.cpp:full-cuda -f .devops/full-cuda.Dockerfile .
+docker build -t local/llama.cpp:light-cuda -f .devops/main-cuda.Dockerfile .
+```
+
+You may want to pass in some different `ARGS`, depending on the CUDA environment supported by your container host, as well as the GPU architecture.
+
+The defaults are:
+
+- `CUDA_VERSION` set to `11.7.1`
+- `CUDA_DOCKER_ARCH` set to `all`
+
+The resulting images, are essentially the same as the non-CUDA images:
+
+1. `local/llama.cpp:full-cuda`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
+2. `local/llama.cpp:light-cuda`: This image only includes the main executable file.
+
+#### Usage
+
+After building locally, Usage is similar to the non-CUDA examples, but you'll need to add the `--gpus` flag. You will also want to use the `--n-gpu-layers` flag.
+
+```bash
+docker run --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
+docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
+```
+
 ### Contributing
 
 - Contributors can open PRs
author	dylan <canardleteer@users.noreply.github.com>	2023-07-07 11:25:25 -0700
committer	GitHub <noreply@github.com>	2023-07-07 21:25:25 +0300
commit	84525e7962bee0abef91108948bbf7f7bfdcf421 (patch)
tree	e2e732cb057398249b7b98f46d57d3208a10a2e5 /README.md
parent	a7e20edf2266169ccd97a4eb949a593d628fbd64 (diff)