This is the piece to sit with this week. The authors — platform engineers at Cloudflare —
walk through the actual internal AI-engineering stack they use to develop Cloudflare
itself, and it happens to be built almost entirely on Workers, AI Gateway, Durable
Objects, Sandboxes, and D1. It is simultaneously an engineering memoir and a (very
polished) proof-of-dogfood.
The architectural spine is a single-proxy Worker pattern. Every LLM call inside
Cloudflare — from IDE completions to MR review agents — routes through one AI Gateway
Worker. That gives them per-user attribution without touching client configs, model
catalog enforcement, and a single place to slot new providers. Between the lines is the
usual lesson: “direct connections don't scale” was learned the hard way.
The second-most interesting idea is how they compress MCP tool schemas. At Cloudflare
scale (2,055 services, 375 teams, 3,900 repos), naïvely exposing every tool to every
agent blows out the context window. Their Portal-level Code Mode collapses many schemas
into two portal-level tools, shrinking context overhead from roughly 15,000 tokens to
near-zero per request. If you've ever watched your MCP-heavy agent sessions slow down
linearly with tool-catalog size, this is the pattern to study.
“One thing we got right early: routing through a single proxy Worker from day one.
The proxy pattern gives you a control plane that direct connections don't.”
Two more details deserve flagging for senior engineers. First, Backstage-as-knowledge-graph:
they feed the service catalog, ownership metadata, and dependency graph into agents as
structured context — it turns “who owns this?” from a Slack question into a
tool call. Second, AGENTS.md files auto-generated per repo act as the bootstrap context
every agent reads before touching code; think of it as README.md for machines, versioned
alongside source.
The cost-engineering note is quietly the most actionable: 51.47 billion input tokens per
month through Workers AI using Kimi K2.5, at ~77% lower cost than frontier for the bulk
of non-critical work. Frontier models still handle the sharp-edged tasks. This is the
“routing is the feature” thesis made real at production scale — and it's
the kind of number that earns you a budget conversation with your CFO.
Read the full piece on blog.cloudflare.com →