Qwen2.5-7B-Instruct (vLLM / Docker / CUDA)

Docker

by mm-team  ·  5/18/2026

Qwen2.5 7B Instruct served via vLLM in Docker. Requires 16 GB VRAM (RTX 3080 / 4070 or better). OpenAI-compatible API.

↓ 21 downloads ♥ 7 likes ⚡ 15 boots cuda
config.json
{
  "server_type": "docker",
  "hardware_type": "cuda",
  "script": "scripts/docker/cuda/setup.sh",
  "image": "vllm/vllm-openai:latest",
  "container_name": "mm-qwen25-7b",
  "ports": [
    "8080:8000"
  ],
  "volumes": [
    "./models:/models:ro"
  ],
  "env": {
    "MODEL": "/models/Qwen2.5-7B-Instruct",
    "MAX_MODEL_LEN": "8192",
    "TENSOR_PARALLEL_SIZE": "1"
  },
  "health_check": "http://127.0.0.1:8080/health",
  "startup_timeout_secs": 120,
  "runtime": "nvidia"
}