№ 12 · Anthropic Engineering
Decoupling the brain from the hands.
Lance Martin, Gabe Cemaj, Michael Cohen · Anthropic Engineering · April 8, 2026
Anthropic's engineering blog laid out, in unusual detail, the architecture behind Managed Agents — the hosted Claude service for long-running, tool-using sessions. The thesis is simple to state and surprisingly load-bearing: treat the model's reasoning loop, its execution sandbox, and the session log as three independent services, connected only by stable interfaces.
The pre-Managed-Agents world is what the authors call "pet containers": a single VM that holds the model harness, the user's tools, the session state, and the credentials, all entangled. It works until it doesn't — and when it doesn't, you can't redeploy the harness without losing context, can't replace the sandbox without re-bootstrapping auth, and can't stream session events to anyone but the harness that wrote them. So Managed Agents tears the three apart. The harness becomes the orchestrator. The sandbox becomes a fungible execution surface that can crash, get replaced, and report tool-error semantics back upstream. The session is an append-only event log that lives outside Claude's context window, retrievable and transformable independently of the run that produced it.
"Failures stop being incidents — they become tool errors the harness can reason about."
The payoff numbers are concrete. Removing upfront container provisioning cut time-to-first-token by roughly 60% at p50 and 90% at p95 — the long tail being the part that hurt most. Credentials never reach the sandbox; auth is bundled with each provisioned resource or fetched from an external vault, and Claude only ever sees tools through a secure proxy. The OS analogy the authors lean on — read() abstracting over wildly different storage hardware — is the right one. They're virtualizing agent components into stable interfaces that outlive any one model generation.
For anyone designing their own long-running agent infra, this is the architecture worth stealing from. The interesting question it leaves open: where, exactly, does the harness sit on the spectrum between "thin shell around a model" and "miniature Kubernetes for tool calls"? The piece is honest that the answer keeps shifting as models get better — which is itself the argument for why these boundaries need to be stable enough to survive that drift.
Read on Anthropic Engineering →