Runpod Serverless SGLang
Pre-built Docker image that runs on Runpod Serverless.
Usage
The image is published to https://hub.docker.com/r/degroote22/lmscript-runpod-serverless
The DockerHub image can be deployed to a machine with a 24gb RAM GPU without any configuration changes.
Environment Variables for Configuration
| Name | Detail |
|---|---|
| REPO_ID | HuggingFace repository with the language model. Defaults to "TheBloke/Mistral-7B-Instruct-v0.2-AWQ" |
| DISABLE_FLASH_INFER | Set to "yes" to disable FlashInfer. Older GPUs are not supported by FlashInfer. Defaults to "no". |
| CONCURRENCY_PER_WORKER | Number of concurrent requests per Runpod Serverless Worker. Defaults to 50. |
Docker-Compose
There is an example of a Docker-compose file in the repository.
Clone the LmScript repository and:
cd docker/runpod-serverless-sglangdocker-compose up