Production-Grade MCP Servers: The Checklist Nobody Gives You

Q: Why does Streamable HTTP transport fail silently for most developers?

Because the MCP spec requires an HTTP POST to /mcp that includes Accept: text/event-stream. If your server does not inspect this header and return Content-Type: text/event-stream with Transfer-Encoding: chunked (no Content-Length), the client gets a 200 with a body it cannot parse as SSE. Many reverse proxies buffer the response body, which turns a streaming SSE session into a single-shot response that never delivers tool results.

Q: What is the minimum viable observability stack for an MCP server?

Three things: a /health endpoint that validates downstream dependencies (not just process liveness), structured JSON logs with a correlation ID that flows from the MCP request ID through every tool call, and a metric for tool-call latency and error rate per tool name. Without per-tool error rates you cannot distinguish 'the server is down' from 'one tool is broken and poisoning agent reliability'.

For engineers shipping real AI agents — not demos • June 2026

Every MCP tutorial ends the same way: the server starts, the tool responds, the demo works. Then you ship it. Three days later an agent is silently failing every tool call, your registry listing points to a dead process, and you have no idea how long it has been broken. This article is the checklist I wish I had before that happened.

01 A Registry Listing Is Not a Deployment

This sounds obvious. It is not obvious in practice, because registry listings and server processes have completely independent lifecycles. You push a new entry to mcp.json, the PR merges, the entry is live. The server URL it points to? That is a separate system. It can restart. It can be misconfigured. It can be down for four days while the registry cheerfully advertises it as available.

The agents consuming your server do not distinguish between "server is down" and "tool does not exist." They either get a connection refused, a timeout, or a malformed response — and they fail in ways that look like reasoning failures, not infrastructure failures. Your users blame the model.

The failure pattern Registry entry created during staging. DNS resolves. Server deployed. Six weeks later, Kubernetes reschedules the pod to a new node. The service's internal routing is misconfigured. The registry still lists the server. Agents fail silently for 72 hours before anyone notices because the failure looks like model hallucination, not a 502.

The fix: a pre-publish connection-test gate

Before any registry entry is created or updated, your CI pipeline must perform a real MCP handshake against the production URL — not staging, not localhost. The gate should:

Open a connection to the server URL
Send a valid initialize request with a real protocol version
Assert the response includes serverInfo and a capabilities object
Call one non-destructive tool (e.g., a ping or list_schemas) and assert success
Fail the pipeline and block the registry update if any step fails

# Example gate script (adapt for your CI)
MCP_URL="https://your-mcp-server.internal/mcp"
RESPONSE=$(curl -sf -X POST "$MCP_URL" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -H "Authorization: Bearer $MCP_DEPLOY_TOKEN" \
  --max-time 10 \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"ci-gate","version":"1.0"}}}')

echo "$RESPONSE" | grep -q '"serverInfo"' || { echo "MCP handshake failed"; exit 1; }
echo "Gate passed."

Run this gate on a schedule — every 5 minutes in production — not just at deploy time. A process that passes at deploy time can fail an hour later.

02 Streamable HTTP Transport: The Handshake You Have to Get Right

The MCP spec's Streamable HTTP transport is not vanilla HTTP JSON. It is HTTP POST that negotiates a Server-Sent Events stream. Most developers implement the POST part correctly and completely miss the SSE part.

The required behavior: when a client sends POST /mcp with Accept: text/event-stream, your server must respond with:

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
No Content-Length header (chunked transfer only)
Data flushed immediately — not buffered until the handler returns

# What the client sends
POST /mcp HTTP/1.1
Content-Type: application/json
Accept: text/event-stream
Authorization: Bearer <token>

{"jsonrpc":"2.0","id":1,"method":"initialize",...}

# What your server MUST return
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

data: {"jsonrpc":"2.0","id":1,"result":{"serverInfo":...}}

data: {"jsonrpc":"2.0","method":"notifications/initialized"}

The reverse proxy buffering trap nginx, Caddy, and AWS ALB all buffer upstream responses by default. If your MCP server is behind any of these without SSE-specific config, the client gets a connection that appears open but never delivers events. The session hangs. Tool calls time out. This is the single most common production failure for Streamable HTTP servers.

For nginx, you need these directives on the MCP location block:

location /mcp {
    proxy_pass http://127.0.0.1:8080;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    proxy_buffering off;          # critical
    proxy_cache off;
    proxy_read_timeout 300s;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Request-ID $request_id;
}

For Caddy, flush_interval -1 in the reverse_proxy directive. For AWS ALB, set idle_timeout to at least 300 seconds and use HTTP/2 or HTTP/1.1 keep-alive. Always test with curl -N (no-buffering) and verify you receive events in real time before calling the transport working.

If the client does not send Accept: text/event-stream

The spec allows single-response mode for clients that send only Accept: application/json. You should support both. Detect the header, respond accordingly. Clients that cannot handle SSE still deserve a synchronous JSON response rather than a 406.

03 Auth: Zero-Auth Is a Network Topology Decision, Not a Laziness Decision

The MCP spec does not mandate authentication. This makes sense for stdio servers, which run as child processes under the agent runtime and inherit its process isolation. It makes no sense for HTTP servers, where "no auth" means "anyone who can route a packet to your server can invoke your tools."

Transport	Auth Minimum	Why
stdio	None required	Process isolation is the boundary
localhost HTTP	None (with caveats)	Other local processes can still hit it; consider SSRF risk
Internal network HTTP	Bearer token, validated every request	Internal ≠ trusted; SSRF, lateral movement
Public HTTP	OAuth 2.0 or mTLS	No exceptions

The SSRF amplification problem A zero-auth MCP server on an internal network becomes an SSRF amplifier. Any web application with an outbound HTTP vulnerability can now invoke your tools on behalf of an attacker — including tools that write data, call downstream APIs, or exfiltrate secrets. The blast radius is proportional to how capable your tool set is.

Validate the token on every request, not just initialize. The handshake completing does not mean the caller is still authorized for subsequent tool calls. Token revocation must propagate within one request cycle, not one session.

If you are building for multiple tenants, the token must encode the tenant identity and your tool implementations must scope all data access by that identity. There is no implicit isolation from the MCP layer itself.

04 Rate Limiting: Protect Yourself From Your Own Agents

Agents loop. That is their entire value proposition. It is also how a single misconfigured prompt causes your MCP server to receive 3,000 tool calls per minute from one agent instance while your downstream API rate-limits you at 100 per minute and starts returning 429s to every other caller.

Rate limiting for MCP servers has three scopes you need to handle separately:

Per-caller: limit requests from a single authenticated identity (client ID or token)
Per-tool: some tools are cheap (lookup), some are expensive (generate image, run query). Apply different limits per tool name
Global: protect downstream dependencies with a circuit breaker, not just a counter

// Pseudocode: per-tool rate limit in middleware
const TOOL_LIMITS = {
  "search_database": { rps: 50, burst: 10 },
  "run_code":        { rps: 5,  burst: 2  },
  "get_config":      { rps: 200, burst: 50 },
};

function rateLimitMiddleware(toolName, callerId) {
  const limit = TOOL_LIMITS[toolName] ?? { rps: 20, burst: 5 };
  const key = `ratelimit:${callerId}:${toolName}`;
  if (!tokenBucket.allow(key, limit)) {
    return mcpError(-32000, "Rate limit exceeded", {
      retryAfterMs: tokenBucket.nextAllowedMs(key)
    });
  }
}

Return the retry delay in the error response. Agents that respect it will back off. Agents that do not respect it will keep hitting you, which is why the rate limiter also needs to escalate from per-tool limits to caller-level suspension after repeated violations.

05 Observability: You Cannot Debug What You Cannot See

The default logging state of most MCP servers is: nothing. A tool call comes in, something happens, a response goes out. If the tool fails, you get a JSON-RPC error. You do not know which agent called it, what the input was, how long each step took, or what downstream systems were hit.

Correlation IDs are not optional

The MCP protocol includes a request id field on every request. That ID must flow through every log line, every downstream API call, and every error response your server produces. When an agent reports "tool X failed at 14:32," you should be able to grep one field and reconstruct the entire call chain in under 30 seconds.

// Structured log entry for every tool invocation
{
  "timestamp": "2026-06-16T14:32:01.443Z",
  "level": "info",
  "event": "tool_call",
  "mcp_request_id": "req-8a2f91",
  "tool_name": "search_database",
  "caller_id": "agent-prod-7f3a",
  "input_token_count": 142,
  "duration_ms": 287,
  "downstream_calls": [
    { "service": "postgres", "query_hash": "a3b9", "duration_ms": 241 }
  ],
  "status": "success"
}

The /health endpoint must actually check dependencies

A /health endpoint that returns 200 OK {"status":"ok"} because the HTTP server is running is worse than no health endpoint. It gives your load balancer confidence to route traffic to a server that cannot actually complete tool calls because its database connection pool is exhausted or its downstream API is timing out.

GET /health HTTP/1.1

# Response MUST reflect real dependency state
{
  "status": "degraded",         // healthy | degraded | unhealthy
  "checks": {
    "database": { "status": "healthy", "latency_ms": 4 },
    "cache":    { "status": "healthy", "latency_ms": 1 },
    "upstream_api": { "status": "degraded", "error": "p99 latency 2400ms" }
  },
  "version": "1.4.2",
  "uptime_seconds": 86401
}

Your monitoring system should distinguish between process-alive (is the port open?) and actually-functional (is the /health response status "healthy"?). Only route traffic to servers that pass the functional check.

Metrics that matter

mcp_tool_calls_total{tool,status} — error rates per tool name surface broken tools without noise
mcp_tool_duration_seconds{tool,p50,p95,p99} — latency regressions show up before agents start timing out
mcp_active_sessions — sudden drops indicate server restarts; sudden spikes indicate agent loops
mcp_downstream_errors_total{service} — distinguish your failures from your dependencies' failures

06 Localhost Binding: The Right Default for the Wrong Reasons

Bind your MCP server process to 127.0.0.1, not 0.0.0.0. Let a reverse proxy (nginx, Caddy, Envoy) handle TLS termination, auth header forwarding, and external routing. This is the right architecture, but most people do it for the wrong reason ("it's more secure by default") and skip the reasoning, which means they undo it the first time something is hard to debug.

The correct mental model: your MCP server process is a trusted internal service. It should receive only pre-validated, pre-authenticated requests from the proxy. The proxy is the security boundary. If you open the process port directly to the network — even temporarily for debugging — you have eliminated the boundary entirely.

The debugging trap "I'll just open port 8080 temporarily so I can curl the server directly." That port stays open. The firewall rule that was supposed to close it after the incident gets forgotten. Six months later, a port scan finds a zero-auth MCP tool executor exposed to the internal network. This scenario is not hypothetical.

Enforce the binding in your server configuration — not just your firewall. If the code binds to 0.0.0.0, no firewall rule will save you from a misconfiguration. Bind to 127.0.0.1 in code, and treat any deviation as a deployment error that blocks the rollout.

07 Supply Chain Safety for Self-Updating Workers

Many MCP server deployments involve workers that fetch code or configuration at runtime — plugins, tool definitions, prompt templates stored remotely and pulled on startup. This is a supply-chain attack surface that the MCP ecosystem has not taken seriously yet.

The attack is straightforward: you have a worker that fetches its tool definitions from a remote URL. An attacker compromises that URL — DNS poisoning, CDN account takeover, S3 bucket policy misconfiguration. Your worker fetches the compromised definitions. Every agent using your server now executes attacker-controlled tools.

The hash-manifest pattern

At build time, generate a manifest of every remote resource your worker will load, with its expected SHA-256 hash. Bundle this manifest into your deployment artifact (not fetched remotely). At runtime, before loading any remote resource:

Fetch the resource
Compute its SHA-256
Compare against the pinned manifest
If the hash does not match: log the discrepancy with full context, refuse to load the resource, alert immediately, and continue operating with the last verified version

# manifest.json (committed to your deployment artifact)
{
  "resources": {
    "https://cdn.example.com/tools/v1.4.2/definitions.json": {
      "sha256": "e3b0c44298fc1c149afb...a495991b7852b855",
      "last_verified": "2026-06-15T10:00:00Z"
    }
  }
}

# Startup verification (pseudocode)
for url, expected_hash in manifest.resources:
    content = fetch(url)
    actual_hash = sha256(content)
    if actual_hash != expected_hash:
        alert(f"Hash mismatch for {url}: expected {expected_hash}, got {actual_hash}")
        use_cached_version(url)   # never execute unverified code
        raise StartupError()

Never pin to a mutable reference like latest or a branch name. Pin to a content-addressed digest. The update process should be: update the manifest, commit it, deploy — not auto-fetch whatever is newest.

The verifier and the updater must not share the same trust root. If an attacker can modify both the remote resource and your manifest fetch location, hash verification is pointless. The manifest must come from your build system, not from the same CDN you are verifying against.

The Pre-Ship Checklist

Server binds to 127.0.0.1, reverse proxy handles external traffic
nginx/Caddy configured with proxy_buffering off for the MCP route
SSE transport verified with curl -N against production URL
Auth required on every request, not just initialize
Per-tool rate limits defined and tested with burst traffic
/health endpoint checks real dependencies, not just process liveness
Structured JSON logs with mcp_request_id on every line
Per-tool error rate and latency metrics wired to alerting
Hash manifest for all remote resources, verified at startup
Pre-publish CI gate performs real MCP handshake against production
Scheduled connection test running every 5 minutes in production
Registry entry update blocked unless CI gate passes

Frequently Asked Questions

What is the difference between listing an MCP server in a registry and actually deploying it?

A registry entry is just metadata — a URL, a name, a description. The server process must be running, reachable, correctly routed, and passing health checks before any agent can use it. Servers listed in registries go dead for days because the listing and the deployment lifecycle are completely decoupled. Run a pre-publish connection-test gate that does a real MCP initialize handshake against your production URL before any registry entry is accepted or updated.

Why does Streamable HTTP transport fail silently for most developers?

Because the MCP spec requires an HTTP POST to /mcp that includes Accept: text/event-stream. If your server does not inspect this header and return Content-Type: text/event-stream with chunked encoding and no Content-Length, the client gets a 200 with a body it cannot parse as SSE. Reverse proxy buffering is the most common culprit — nginx and AWS ALB buffer upstream responses by default, turning a streaming SSE session into a single-shot response that never delivers tool results.

Should MCP servers be zero-auth by default?

Only on localhost, and only when the process is already isolated (e.g., a stdio server). Any server exposed over HTTP — even on an internal network — should require authentication validated on every request, not just the initialize handshake. Zero-auth HTTP MCP servers are one misconfigured nginx rule away from becoming a public tool executor.

How do you prevent a self-updating MCP worker from being supply-chain compromised?

At startup, before loading any dynamically fetched code, verify its SHA-256 against a hash-manifest that is pinned in your deployment artifact — not fetched from the same remote. If the hash does not match, refuse to start and alert immediately. Pin to content-addressed digests, never mutable tags like "latest." The update pipeline and the verifier must not share the same trust root.

What is the minimum viable observability stack for an MCP server?

Three things: a /health endpoint that validates downstream dependencies (not just process liveness), structured JSON logs with a correlation ID that flows from the MCP request ID through every tool call, and metrics for tool-call latency and error rate broken down per tool name. Without per-tool error rates you cannot distinguish "the server is down" from "one tool is broken and poisoning agent reliability."