Production-Grade MCP Servers: The Checklist Nobody Gives You
Every MCP tutorial ends the same way: the server starts, the tool responds, the demo works. Then you ship it. Three days later an agent is silently failing every tool call, your registry listing points to a dead process, and you have no idea how long it has been broken. This article is the checklist I wish I had before that happened.
01 A Registry Listing Is Not a Deployment
This sounds obvious. It is not obvious in practice, because registry listings and server processes have completely independent lifecycles. You push a new entry to mcp.json, the PR merges, the entry is live. The server URL it points to? That is a separate system. It can restart. It can be misconfigured. It can be down for four days while the registry cheerfully advertises it as available.
The agents consuming your server do not distinguish between "server is down" and "tool does not exist." They either get a connection refused, a timeout, or a malformed response — and they fail in ways that look like reasoning failures, not infrastructure failures. Your users blame the model.
The fix: a pre-publish connection-test gate
Before any registry entry is created or updated, your CI pipeline must perform a real MCP handshake against the production URL — not staging, not localhost. The gate should:
- Open a connection to the server URL
- Send a valid
initializerequest with a real protocol version - Assert the response includes
serverInfoand a capabilities object - Call one non-destructive tool (e.g., a
pingorlist_schemas) and assert success - Fail the pipeline and block the registry update if any step fails
# Example gate script (adapt for your CI)
MCP_URL="https://your-mcp-server.internal/mcp"
RESPONSE=$(curl -sf -X POST "$MCP_URL" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-H "Authorization: Bearer $MCP_DEPLOY_TOKEN" \
--max-time 10 \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"ci-gate","version":"1.0"}}}')
echo "$RESPONSE" | grep -q '"serverInfo"' || { echo "MCP handshake failed"; exit 1; }
echo "Gate passed."
Run this gate on a schedule — every 5 minutes in production — not just at deploy time. A process that passes at deploy time can fail an hour later.
02 Streamable HTTP Transport: The Handshake You Have to Get Right
The MCP spec's Streamable HTTP transport is not vanilla HTTP JSON. It is HTTP POST that negotiates a Server-Sent Events stream. Most developers implement the POST part correctly and completely miss the SSE part.
The required behavior: when a client sends POST /mcp with Accept: text/event-stream, your server must respond with:
Content-Type: text/event-streamCache-Control: no-cacheConnection: keep-alive- No
Content-Lengthheader (chunked transfer only) - Data flushed immediately — not buffered until the handler returns
# What the client sends
POST /mcp HTTP/1.1
Content-Type: application/json
Accept: text/event-stream
Authorization: Bearer <token>
{"jsonrpc":"2.0","id":1,"method":"initialize",...}
# What your server MUST return
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
data: {"jsonrpc":"2.0","id":1,"result":{"serverInfo":...}}
data: {"jsonrpc":"2.0","method":"notifications/initialized"}
For nginx, you need these directives on the MCP location block:
location /mcp {
proxy_pass http://127.0.0.1:8080;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_buffering off; # critical
proxy_cache off;
proxy_read_timeout 300s;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Request-ID $request_id;
}
For Caddy, flush_interval -1 in the reverse_proxy directive. For AWS ALB, set idle_timeout to at least 300 seconds and use HTTP/2 or HTTP/1.1 keep-alive. Always test with curl -N (no-buffering) and verify you receive events in real time before calling the transport working.
If the client does not send Accept: text/event-stream
The spec allows single-response mode for clients that send only Accept: application/json. You should support both. Detect the header, respond accordingly. Clients that cannot handle SSE still deserve a synchronous JSON response rather than a 406.
03 Auth: Zero-Auth Is a Network Topology Decision, Not a Laziness Decision
The MCP spec does not mandate authentication. This makes sense for stdio servers, which run as child processes under the agent runtime and inherit its process isolation. It makes no sense for HTTP servers, where "no auth" means "anyone who can route a packet to your server can invoke your tools."
| Transport | Auth Minimum | Why |
|---|---|---|
| stdio | None required | Process isolation is the boundary |
| localhost HTTP | None (with caveats) | Other local processes can still hit it; consider SSRF risk |
| Internal network HTTP | Bearer token, validated every request | Internal ≠ trusted; SSRF, lateral movement |
| Public HTTP | OAuth 2.0 or mTLS | No exceptions |
Validate the token on every request, not just initialize. The handshake completing does not mean the caller is still authorized for subsequent tool calls. Token revocation must propagate within one request cycle, not one session.
If you are building for multiple tenants, the token must encode the tenant identity and your tool implementations must scope all data access by that identity. There is no implicit isolation from the MCP layer itself.
04 Rate Limiting: Protect Yourself From Your Own Agents
Agents loop. That is their entire value proposition. It is also how a single misconfigured prompt causes your MCP server to receive 3,000 tool calls per minute from one agent instance while your downstream API rate-limits you at 100 per minute and starts returning 429s to every other caller.
Rate limiting for MCP servers has three scopes you need to handle separately:
- Per-caller: limit requests from a single authenticated identity (client ID or token)
- Per-tool: some tools are cheap (lookup), some are expensive (generate image, run query). Apply different limits per tool name
- Global: protect downstream dependencies with a circuit breaker, not just a counter
// Pseudocode: per-tool rate limit in middleware
const TOOL_LIMITS = {
"search_database": { rps: 50, burst: 10 },
"run_code": { rps: 5, burst: 2 },
"get_config": { rps: 200, burst: 50 },
};
function rateLimitMiddleware(toolName, callerId) {
const limit = TOOL_LIMITS[toolName] ?? { rps: 20, burst: 5 };
const key = `ratelimit:${callerId}:${toolName}`;
if (!tokenBucket.allow(key, limit)) {
return mcpError(-32000, "Rate limit exceeded", {
retryAfterMs: tokenBucket.nextAllowedMs(key)
});
}
}
Return the retry delay in the error response. Agents that respect it will back off. Agents that do not respect it will keep hitting you, which is why the rate limiter also needs to escalate from per-tool limits to caller-level suspension after repeated violations.
05 Observability: You Cannot Debug What You Cannot See
The default logging state of most MCP servers is: nothing. A tool call comes in, something happens, a response goes out. If the tool fails, you get a JSON-RPC error. You do not know which agent called it, what the input was, how long each step took, or what downstream systems were hit.
Correlation IDs are not optional
The MCP protocol includes a request id field on every request. That ID must flow through every log line, every downstream API call, and every error response your server produces. When an agent reports "tool X failed at 14:32," you should be able to grep one field and reconstruct the entire call chain in under 30 seconds.
// Structured log entry for every tool invocation
{
"timestamp": "2026-06-16T14:32:01.443Z",
"level": "info",
"event": "tool_call",
"mcp_request_id": "req-8a2f91",
"tool_name": "search_database",
"caller_id": "agent-prod-7f3a",
"input_token_count": 142,
"duration_ms": 287,
"downstream_calls": [
{ "service": "postgres", "query_hash": "a3b9", "duration_ms": 241 }
],
"status": "success"
}
The /health endpoint must actually check dependencies
A /health endpoint that returns 200 OK {"status":"ok"} because the HTTP server is running is worse than no health endpoint. It gives your load balancer confidence to route traffic to a server that cannot actually complete tool calls because its database connection pool is exhausted or its downstream API is timing out.
GET /health HTTP/1.1
# Response MUST reflect real dependency state
{
"status": "degraded", // healthy | degraded | unhealthy
"checks": {
"database": { "status": "healthy", "latency_ms": 4 },
"cache": { "status": "healthy", "latency_ms": 1 },
"upstream_api": { "status": "degraded", "error": "p99 latency 2400ms" }
},
"version": "1.4.2",
"uptime_seconds": 86401
}
Your monitoring system should distinguish between process-alive (is the port open?) and actually-functional (is the /health response status "healthy"?). Only route traffic to servers that pass the functional check.
Metrics that matter
mcp_tool_calls_total{tool,status}— error rates per tool name surface broken tools without noisemcp_tool_duration_seconds{tool,p50,p95,p99}— latency regressions show up before agents start timing outmcp_active_sessions— sudden drops indicate server restarts; sudden spikes indicate agent loopsmcp_downstream_errors_total{service}— distinguish your failures from your dependencies' failures
06 Localhost Binding: The Right Default for the Wrong Reasons
Bind your MCP server process to 127.0.0.1, not 0.0.0.0. Let a reverse proxy (nginx, Caddy, Envoy) handle TLS termination, auth header forwarding, and external routing. This is the right architecture, but most people do it for the wrong reason ("it's more secure by default") and skip the reasoning, which means they undo it the first time something is hard to debug.
The correct mental model: your MCP server process is a trusted internal service. It should receive only pre-validated, pre-authenticated requests from the proxy. The proxy is the security boundary. If you open the process port directly to the network — even temporarily for debugging — you have eliminated the boundary entirely.
Enforce the binding in your server configuration — not just your firewall. If the code binds to 0.0.0.0, no firewall rule will save you from a misconfiguration. Bind to 127.0.0.1 in code, and treat any deviation as a deployment error that blocks the rollout.
07 Supply Chain Safety for Self-Updating Workers
Many MCP server deployments involve workers that fetch code or configuration at runtime — plugins, tool definitions, prompt templates stored remotely and pulled on startup. This is a supply-chain attack surface that the MCP ecosystem has not taken seriously yet.
The attack is straightforward: you have a worker that fetches its tool definitions from a remote URL. An attacker compromises that URL — DNS poisoning, CDN account takeover, S3 bucket policy misconfiguration. Your worker fetches the compromised definitions. Every agent using your server now executes attacker-controlled tools.
The hash-manifest pattern
At build time, generate a manifest of every remote resource your worker will load, with its expected SHA-256 hash. Bundle this manifest into your deployment artifact (not fetched remotely). At runtime, before loading any remote resource:
- Fetch the resource
- Compute its SHA-256
- Compare against the pinned manifest
- If the hash does not match: log the discrepancy with full context, refuse to load the resource, alert immediately, and continue operating with the last verified version
# manifest.json (committed to your deployment artifact)
{
"resources": {
"https://cdn.example.com/tools/v1.4.2/definitions.json": {
"sha256": "e3b0c44298fc1c149afb...a495991b7852b855",
"last_verified": "2026-06-15T10:00:00Z"
}
}
}
# Startup verification (pseudocode)
for url, expected_hash in manifest.resources:
content = fetch(url)
actual_hash = sha256(content)
if actual_hash != expected_hash:
alert(f"Hash mismatch for {url}: expected {expected_hash}, got {actual_hash}")
use_cached_version(url) # never execute unverified code
raise StartupError()
Never pin to a mutable reference like latest or a branch name. Pin to a content-addressed digest. The update process should be: update the manifest, commit it, deploy — not auto-fetch whatever is newest.
The verifier and the updater must not share the same trust root. If an attacker can modify both the remote resource and your manifest fetch location, hash verification is pointless. The manifest must come from your build system, not from the same CDN you are verifying against.
The Pre-Ship Checklist
- Server binds to
127.0.0.1, reverse proxy handles external traffic - nginx/Caddy configured with
proxy_buffering offfor the MCP route - SSE transport verified with
curl -Nagainst production URL - Auth required on every request, not just
initialize - Per-tool rate limits defined and tested with burst traffic
/healthendpoint checks real dependencies, not just process liveness- Structured JSON logs with
mcp_request_idon every line - Per-tool error rate and latency metrics wired to alerting
- Hash manifest for all remote resources, verified at startup
- Pre-publish CI gate performs real MCP handshake against production
- Scheduled connection test running every 5 minutes in production
- Registry entry update blocked unless CI gate passes
Frequently Asked Questions
/mcp that includes Accept: text/event-stream. If your server does not inspect this header and return Content-Type: text/event-stream with chunked encoding and no Content-Length, the client gets a 200 with a body it cannot parse as SSE. Reverse proxy buffering is the most common culprit — nginx and AWS ALB buffer upstream responses by default, turning a streaming SSE session into a single-shot response that never delivers tool results./health endpoint that validates downstream dependencies (not just process liveness), structured JSON logs with a correlation ID that flows from the MCP request ID through every tool call, and metrics for tool-call latency and error rate broken down per tool name. Without per-tool error rates you cannot distinguish "the server is down" from "one tool is broken and poisoning agent reliability."