aboutsummaryrefslogtreecommitdiff
path: root/examples/server/README.md
AgeCommit message (Collapse)Author
2023-06-06Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703)Johannes Gäßler
* CUDA multi GPU + scratch ggml_cuda_compute_forward Tensor parallelism ggml_cuda_add ggml_cuda_rms_norm ggml_cuda_silu CUDA scratch buffer --main-gpu CLI option
2023-05-28Only show -ngl option when relevant + other doc/arg handling updates (#1625)Kerfuffle
1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356
2023-05-21examples : add server example with REST API (#1443)Steward Garcia
* Added httplib support * Added readme for server example * fixed some bugs * Fix the build error on Macbook * changed json11 to nlohmann-json * removed some whitespaces * remove trailing whitespace * added support custom prompts and more functions * some corrections and added as cmake option