Llama-3.2-3B-Instruct (GGUF / CUDA)

Process

by mm-team · 5/18/2026

Llama 3.2 3B Instruct Q4_K_M via llama-server. Runs on RTX 3060 or better (12 GB VRAM). OpenAI-compatible API on port 8080.

↓ 12 downloads ♥ 4 likes ⚡ 8 boots cuda

#gguf #instruct #llama #llama.cpp

config.json

{
  "server_type": "process",
  "hardware_type": "cuda",
  "script": "scripts/process/cuda/setup.sh",
  "env": {
    "MODEL_PATH": "models/llama-3.2-3b-instruct-q4_k_m.gguf",
    "HOST": "127.0.0.1",
    "PORT": "8080",
    "GPU_LAYERS": "99",
    "CTX_SIZE": "4096"
  },
  "health_check": "http://127.0.0.1:8080/health",
  "startup_timeout_secs": 60
}