Qwen2.5-7B-Instruct (vLLM / Docker / CUDA)
Dockerby mm-team · 5/18/2026
Qwen2.5 7B Instruct served via vLLM in Docker. Requires 16 GB VRAM (RTX 3080 / 4070 or better). OpenAI-compatible API.
↓ 21 downloads ♥ 7 likes ⚡ 15 boots cuda
config.json
{
"server_type": "docker",
"hardware_type": "cuda",
"script": "scripts/docker/cuda/setup.sh",
"image": "vllm/vllm-openai:latest",
"container_name": "mm-qwen25-7b",
"ports": [
"8080:8000"
],
"volumes": [
"./models:/models:ro"
],
"env": {
"MODEL": "/models/Qwen2.5-7B-Instruct",
"MAX_MODEL_LEN": "8192",
"TENSOR_PARALLEL_SIZE": "1"
},
"health_check": "http://127.0.0.1:8080/health",
"startup_timeout_secs": 120,
"runtime": "nvidia"
}