Minimal MCP server that lets Cursor talk to a local llama.cpp llama-server (OpenAI-compatible).
Zero llama-cpp-python dependency. Built for reproducible debugging, clean automations, and concise docs.
# 0) Run llama-server (adjust paths/flags)
scripts/run-llama-server.sh
# 1) Put config/mcp.json into ~/.cursor/mcp.json (or project-local .cursor/mcp.json)
# then open Cursor; it will spawn the MCP server via stdio.
# 2) Smoke test outside Cursor
scripts/smoke-chat.shSee examples/cursor-commands.md for copy-paste @llama-mcp calls.
- /v1/models empty → wrong
--modelpath or the server started without a model. - HTTP 400/422 → schema mismatch; check your llama.cpp version and that OpenAI-compatible mode is on.
- Timeouts → raise
LLAMA_TIMEOUT_Sor reducemax_tokens. - Connection errors → verify
LLAMA_BASE_URLhost/port/firewall. - Older GPUs / low VRAM (e.g., 1GB) → prefer Q4_K_M or smaller ctx-size; keep prompts short.
scripts/support-bundle.sh support.zipThis zips env/version info, /v1/models, /health, and a minimal repro request for fast triage.
The Cursor Technical Support Engineer role highlights debugging tricky issues, building automations/tools, and crisp docs. This repo shows:
- health/models tools for instant diagnostics
- retrying HTTP client with structured logs (
LLAMA_LOG_JSON=1) - one-command smoke tests + support bundle
- small but clear docs and examples
MIT