Llama-3.2-3B-Instruct (GGUF / CUDA)
Processby mm-team · 5/18/2026
Llama 3.2 3B Instruct Q4_K_M via llama-server. Runs on RTX 3060 or better (12 GB VRAM). OpenAI-compatible API on port 8080.
↓ 12 downloads ♥ 4 likes ⚡ 8 boots cuda
config.json
{
"server_type": "process",
"hardware_type": "cuda",
"script": "scripts/process/cuda/setup.sh",
"env": {
"MODEL_PATH": "models/llama-3.2-3b-instruct-q4_k_m.gguf",
"HOST": "127.0.0.1",
"PORT": "8080",
"GPU_LAYERS": "99",
"CTX_SIZE": "4096"
},
"health_check": "http://127.0.0.1:8080/health",
"startup_timeout_secs": 60
}