June 2, 2026 agent infrastructure x402 lessons

We almost charged agents for doing nothing

We built an agent-callable broker that takes payment in USDC on Base. Before we turned on billing, an audit caught a critical bug: every single paid operation would have charged real money and done absolutely nothing in return. Here is the exact failure, the fix, and the one principle we think every agent payment system needs.

What we built, and why

AgentBroker is a service that lets autonomous AI agents book appointments, capture leads, and send messages to the roughly 60 million small businesses that have no API. The business model is simple: agents pay per call in USDC on Base using the x402 protocol. No signup, no API key, no subscription — the agent attaches a signed payment header and the tool runs.

The transport is MCP over Streamable HTTP, with a Cloudflare Worker edge layer for discovery and a FastAPI origin on Render for state-changing operations. We have 14 tools, compliance gates for 22 jurisdictions, and the thing works. It is also, right now, producing exactly zero dollars in revenue. We have had three paid-tool attempts and zero completed payments. That is the honest picture.

Before enabling live billing, we ran a read-only audit of the entire payment path. What it found was worse than a billing bug. It was a trust bug.

The bug: “tool returned success” is not “real-world state changed”

The SMB directory currently holds twenty demo entries — placeholder businesses we seeded to make the service discoverable and testable. Every entry has is_demo=True. The service documentation, the response body from find_business, and the code comments all promised agents the same thing: bookings on demo entries short-circuit with a demo_smb_no_live_booking receipt. No real call is made, no real booking happens, no charge applies.

None of that short-circuit logic existed in any of the paid handlers.

Here is what would actually have happened when billing went live:

The actual charge path

Agent calls find_business, receives [DEMO] Cuts & Co. with is_demo=true in the response.

Response body explicitly says bookings short-circuit. Agent trusts the documented contract.

Agent calls capture_lead with a valid USDC payment attached.

x402 gate verifies and settles the payment via Coinbase CDP before calling the tool handler. Real USDC deducted.

capture_lead looks up the SMB, finds it, computes a UUID, returns status=success. No CRM write. No external call. Nothing durable.

Settlement finalised. Agent paid $0.05 for a UUID computation and a dictionary lookup.

One hundred percent of the SMBs in the directory are demo entries. This bug would have charged every single paid call for nothing, forever, until someone noticed and complained.

There was a second critical bug alongside it

schedule_appointment has an async path: when an SMB does not have a direct Cal.com API key, the booking gets dispatched to a Celery worker. The tool returns status=pending_async and the worker handles it later — or does not, silently, if the worker is down or the calendar rejects it.

The gate failure-status check looked like this:

_FAILURE_STATUSES = frozenset({
    "failure", "failed", "error", "errored", "rejected"
})

def _receipt_is_error(receipt):
    status = str(receipt.get("status", "")).strip().lower()
    return status in _FAILURE_STATUSES

"pending_async" is not in _FAILURE_STATUSES. So _receipt_is_error returns False. Settlement runs immediately. The booking may never complete. The agent has already paid.

Confirmed live on the running service:

python3 -c "from billing.x402_gate import _receipt_is_error; print(_receipt_is_error({'status': 'pending_async'}))"
Output: False

Why this failure mode is structural, not a one-off mistake

These bugs are not the result of sloppy engineering. The x402 gate is well-structured. The compliance gate is correctly positioned and non-bypassable. The unit tests pass. The MCP tool schemas are accurate. The settlement path follows the x402 spec.

The problem is a conceptual gap that is easy to miss when you are building quickly: a tool returning success is not the same thing as real-world state having changed. The payment gate asked “did the tool return without an error?” It should have asked “did something real actually happen that the caller paid for?”

This is the same failure mode that causes agent retry loops to generate phantom bookings, or causes agents to report task completion when they have received only an acknowledgment and not a confirmation. It comes up again and again in agentic systems because agents, by design, trust the structured output of their tools. When that output says success, the agent has no reason to doubt it.

When you add money to this picture, the cost of that misplaced trust is concrete and irreversible. x402 settlements on Base are final.

The fix, and the principle behind it

The immediate fix was mechanical. We added an is_demo guard at the top of each paid handler. If the SMB is a demo entry, the handler returns a FAILURE receipt with reason_code=demo_smb_no_live_booking before the x402 gate settles. No charge. We added "pending_async" to _FAILURE_STATUSES so the async appointment path no longer charges before the booking is confirmed. We wrote 17 regression tests covering both paths.

The guard in each paid handler looks like this:

if getattr(smb, "is_demo", False):
    return OutcomeReceipt(
        status=OperationStatus.FAILURE,
        reason_code="demo_smb_no_live_booking",
        cost=CostRecord(amount=0.0, currency="USD", basis="no_charge_demo"),
    )
# rest of handler only runs for real SMBs

When this receipt comes back, _receipt_is_error() returns True and the CDP SDK skips settlement entirely. No USDC moves.

The principle we are writing into how we design billing going forward:

Bill only on proven real-world state change.

A tool returning success is a claim. Settlement should happen on the evidence, not the claim. If the evidence requires an async confirmation — a webhook, a Celery result, a calendar event ID — then settlement should be deferred until that evidence arrives. Charging on pending_async is charging for a promise.

This sounds obvious when stated directly. It is not obvious when you are wiring up the billing integration late at night trying to get the payment gate working.

What the x402 / agent-payments stack actually looks like from inside it

The x402 Foundation launched last September with backing from Coinbase, Anthropic, Cloudflare, Google, Visa, AWS, and Circle. By February 2026 the ecosystem had processed 161 million transactions and $43 million in settled volume. AWS Bedrock AgentCore launched native x402 wallet support for agents in May. Stripe and Visa have added x402 support. The protocol works and the adoption is real.

The hard part is not the protocol. The hard part is building the service layer correctly. x402 makes the payment handshake trivially easy to implement — ten lines of middleware, verified by Coinbase CDP, settled on Base. That ease creates a subtle risk: it is easy to treat the payment gate as pure middleware and not think hard about what it is gating on behalf of.

The audit also caught a third structural issue: we had two separate x402 implementations running in parallel. A hand-rolled verifier in the Cloudflare Worker edge, and the standard CDP-backed gate on the Python origin. If both were active simultaneously, an agent could be charged twice for a single call. We had not caught this because each gate worked correctly in isolation. The failure only existed at the system boundary.

None of these are exotic bugs. They are the predictable result of building an agent-native payment system fast without auditing the full call path end-to-end before enabling settlement.

Where we are now

The fix branch is code-complete, 85 tests passing, sitting in review. x402 billing is disabled on production until we merge and deploy. When it goes live, we will be billing only on real-world state changes: a confirmed appointment, a delivered message, an actual CRM write.

Revenue to date: $0. The infrastructure works. The payment protocol works. We caught the bug before it cost anyone anything. The next step is the first real paid call — which requires seeding real public booking URLs into the directory, since 100% of current entries are demo. That is this week.

If you are building agent tools with any kind of billing — USDC micropayments, token-gated APIs, usage metering — the demo-no-op check and the pending-async settlement gap are both worth auditing before you flip the billing switch. The failure mode is quiet and the trust damage is not.

We built a free MCP config auditor that checks your claude_desktop_config.json for schema bloat, missing tokens, and common reliability issues. It is at hatchloop.dev/tools/mcp-audit if you want to run it on your own server.