Cloudflare Collapsed 2,500 API Endpoints Into 2 MCP Tools. Token Economics Matter.
Cloudflare launched Code Mode for their MCP server on February 20th. The result: 2,500+ API endpoints collapsed into two tools that consume roughly 1,000 tokens of context.
That’s not a minor optimization. That’s a fundamental rethinking of how MCP servers should work.
Most MCP servers expose one tool per API endpoint. Need to list DNS records? That’s a tool. Create a DNS record? Another tool. Update, delete, query—each one takes context window space just to describe what it does.
The AI model has to read every tool description to know what’s available. With 2,500 endpoints, that’s thousands of tokens consumed before the model does anything useful. The context window—your most expensive resource—gets eaten by tool catalogs.
Cloudflare’s approach is different. Code Mode exposes two tools: one that describes available operations, one that executes them. The model discovers what it can do on demand instead of loading everything upfront.
If you’re building MCP servers, the lesson isn’t “copy Cloudflare’s architecture.” It’s that tool design is token economics.
Every tool you expose costs context. Every parameter description, every enum value, every example—it all consumes the resource your user is paying for. The question isn’t “can I expose this endpoint?” It’s “is exposing this endpoint worth the context it consumes?”
Most MCP servers I’ve seen (including some of my own) over-expose. They give the model access to everything because it’s technically possible. Cloudflare demonstrated that restraint is a feature.
Expose the minimum surface area that enables the maximum useful work. Lazy-load descriptions. Group related operations. Let the model discover capabilities instead of front-loading them.
Your MCP server’s tool count isn’t a feature list. It’s a cost center. Design accordingly.