Fail Small — Morning Edition

Spread 01 · Cloud · Anthropic × AWS

Anthropic puts Claude on AWS without going through Bedrock. For You

Yesterday Anthropic launched the Claude Platform on AWS — a first-party offering that gives AWS customers native access to the full Claude API with IAM auth, CloudTrail audit logging, and unified billing through their AWS invoice. Unlike Claude on Bedrock, this one is operated by Anthropic, which means day-one access to features that normally lag on Bedrock by months: Managed Agents, code execution, web search, Skills, and the MCP connector.

The interesting architectural beat: enterprises now have a serious option that doesn't force them to choose between AWS-native governance and frontier-feature velocity. If you've been holding off on agentic workloads because your security team requires VPC-resident audit trails, this changes the calculus.

Source · claude.com/blog

01

Spread 02 · Agentic runtimes · Anthropic

Claude agents now dream, grade themselves, and split into specialists. For You

02

At Code with Claude last week Anthropic shipped three features for Managed Agents that change the runtime shape, not just the model. Dreaming (research preview) reviews past sessions to extract patterns and self-improve. Outcomes wires in a separate evaluator that scores output against a rubric and tells the agent what to fix — a tight closed loop instead of one-shot generation. Multiagent Orchestration lets a lead agent decompose work, spawn specialists with their own models and tools, run them in parallel on a shared filesystem, and merge their context back.

Harvey reports a 6× jump in task completion using the new features. The product story is that Claude is no longer a model, it's a workflow runtime — and the abstractions are starting to look more like Airflow than like a chat API.

Source · anthropic.com/engineering

$ codex --in chrome

Spread 03 · Coding agents · OpenAI

Codex moves into Chrome where your signed-in session actually lives. For You

On May 7 OpenAI shipped a Codex Chrome extension on macOS and Windows. The framing is direct: after launching Computer Use in the Codex desktop app, OpenAI saw most workflows actually happening in the browser — LinkedIn, Salesforce, Gmail, internal admin tools — and they wanted Codex to share the tabs you're already authenticated into.

It runs across multiple tabs in parallel, uses DevTools, and can read context across tabs without taking the browser away from you. EU/UK availability is "coming." Codex is at 4M weekly active users — an 8× year-to-date jump that re-frames the IDE landscape entirely.

Source · developers.openai.com/codex

Spread 04 · Adoption · Coding agents

8×

Codex weekly actives · YTD growth

From a side bet to four million weekly users in five months.

OpenAI disclosed last week that Codex now serves 4 million weekly actives, up 8× since January. For comparison, GitHub Copilot took roughly three years to cross 4M monthly. The slope matters more than the absolute number: it suggests the constraint on coding-agent adoption isn't model quality anymore, it's distribution surface — and the Chrome extension is a deliberate move on surface, not on capability.

Source · openai.com/codex

05

Dual-use · Watch this line

Spread 05 · AI · offensive security

Google says a criminal crew used an LLM to find a real, exploitable software flaw.

Google's Threat Intelligence Group reported yesterday that a financially motivated actor used an AI assistant to discover a previously unknown vulnerability in widely deployed software, then build the exploit. It's not the first time AI has been used in offensive research — Project Zero has been publishing its own LLM-assisted findings for months — but it's an early confirmed case where attackers, not defenders, banked the advantage first.

The strategic question that's been theoretical for a year is now an operational one: when offense and defense both use the same tools, the lead is measured in days of patch latency, not quarters. Plan your disclosure pipeline accordingly.

Source · NYT, May 11

06

Spread 06 · Agentic commerce · Cloudflare × Stripe

Stripe handles the payment; Cloudflare handles the discovery; the agent does the buying. For You

If the agent can find a service, negotiate access, and pay for it without a human in the loop, the API isn't an endpoint anymore — it's a marketplace participant.

Cloudflare and Stripe published a joint writeup on what they're calling agentic commerce: Stripe's CLI owns the transactional layer (auth, identity, subscriptions); Cloudflare's CLI owns service discovery (domain purchases, infra provisioning, agent-callable endpoints). The composition is the point — an agent finds a service, evaluates it, and pays for it as one continuous flow.

Source · blog.cloudflare.com · stripe.com/blog

07

Spread 07 · Neuroscience · UCLA

A small molecule that repairs stroke damage long after the stroke.

UCLA's stem cell program announced what they describe as the first compound to actively repair brain tissue damaged by stroke — distinct from the clot-busters that limit damage in the first hours. In mouse models the molecule, administered weeks after the event, restored motor function by promoting cortical remyelination rather than replacing lost neurons.

Translation timelines are the usual long road. The reason it matters now: the mechanism is not "save the brain you have" but "rebuild what was lost," which is a category of intervention the field has spent a generation circling without landing.

Source · UCLA Stem Cell

Spread 08 · Research · Thinking Machines

08

Thinking Machines on the unit of work that should drive eval design. For You

Interaction models Eval design Agent loops

Mira Murati's Thinking Machines published a post arguing that the right primitive for thinking about evals is the interaction model — the shape of one full human-or-tool-or-agent exchange, including its retries, its self-corrections, and its terminal state — rather than the single token or the single response. Evals built on tokens measure fluency; evals built on interactions measure whether the loop actually finishes.

The piece is short and useful if you've been struggling to make agent benchmarks correlate with production behaviour. It also slots cleanly into Anthropic's Outcomes work (see Spread 02): the same insight from a different vendor in the same week.

Source · thinkingmachines.ai/blog

Spread 09 · IDE · Microsoft · Visual Studio 2026

Visual Studio ships with agents that run inside the IDE, not next to it. For You

Microsoft has begun calling Visual Studio 2026 the "first intelligent IDE" — marketing aside, the substantive shift is that the C# and C++ agents are first-party components of the editor process, not Copilot-style sidecars. They live inside the language service, hold persistent context across solutions, and can drive long-running refactors as background tasks. The coordinated release of VS 2026, .NET 10, and the latest VS Code drop is the first time Microsoft has lined up a tooling trifecta this tightly. For engineers in the .NET stack the honest read is: if you skipped Copilot because the loop felt superficial, evaluating VS 2026 with its on-by-default agent mode is a different proposition. Enterprise, Professional and Community editions ship together — no staggered rollouts, no tier-gated agent features. The risk most worth flagging is attention cost: an editor that can refactor your codebase in the background is also one that demands a new kind of code review discipline. The category boundary between "developer tool" and "co-developer" is finally fully gone.

Source · Visual Studio Magazine

Spread 10 · Commerce · Alibaba × Qwen

10

Four billion SKUs, one agentic shopper: Qwen lands inside Taobao.

Alibaba integrated Qwen across Taobao this week, turning the super-app into an agentic shopping surface that spans 4B+ products. The flagship features are conversational discovery ("find me running shoes for a humid trail under ¥600"), a virtual try-on layer, and 30-day price tracking that the agent uses to time purchases. It's the consumer-facing counterpart to Cloudflare and Stripe's developer agentic commerce (Spread 06) — same thesis, different audience.

For Western e-commerce platforms the comparison is unkind: Taobao now has an in-app agent shopping for 1B+ users while Amazon's Rufus is still effectively a Q&A overlay.

Source · Alibaba Newsroom

Hacker News.

Top five · last 24 hrs

orange.firebaseio

01

Learning Software Architecture

82 points · 3 comments · matklad.github.io

matklad — author of rust-analyzer — argues that real software architecture is learned in the wild, not in coursework, and that incentive structures (publish-or-perish, quarterly OKRs) determine code quality more than developer skill. The piece reframes Conway's Law not as observation but as design constraint: when you can't change the org, design modular features with isolated failures and accept that "there's never a time to do a thing properly."

→ HN discussion

02

Postmortem: TanStack NPM supply-chain compromise

875 points · 356 comments · tanstack.com

Between 19:20 and 19:26 UTC on May 11, attackers published 84 malicious versions across 42 @tanstack/* packages by chaining three CI flaws: a misused pull_request_target, a poisonable GitHub Actions cache shared between PR and prod, and OIDC token extraction from runner process memory. The malware ran on npm install and harvested AWS, GCP, Vault, GitHub, and npm credentials. External researchers caught it before TanStack's own alerting did. Read this one even if you don't ship public npm packages.

→ HN discussion

03

Screenshots of Old Desktop OSes

223 points · 70 comments · typewritten.org

A meticulously curated visual archive of desktop operating systems from the 1980s through the early 2000s — NeXTSTEP, BeOS, OS/2, the entire arc of pre-Aqua Mac UI, RISC OS, AmigaOS. The comments turn into a UI nostalgia thread quickly, but the artefacts themselves are worth scrolling through if only to notice how much screen real estate used to be reserved for system chrome and how little for content.

→ HN discussion

04

Toxicity on Social Media — The Noisy Room

48 points · 19 comments · thenoisyroom.com

An essay framing social-media toxicity as a structural property of large rooms rather than a moral failing of individual posters: the same comment that lands fine in a 50-person Discord reads as an attack at 50,000. The author's prescription is amplification design, not content moderation — slow the speed at which strangers can see each other, and the volume problem partially resolves itself.

→ HN discussion

05

They Live (1988) inspired Adblocker

273 points · 91 comments · github.com

An adblocker that doesn't hide ads — it replaces them with "OBEY" / "CONSUME" / "SLEEP" overlays cribbed from the Carpenter film. It's a joke project that's also a small piece of cultural commentary: making the ad disappear lets you forget it was ever there; making it shout the subliminal makes the surveillance economy feel exactly as loud as it actually is.

→ HN discussion

Architecture in the Wild

Cloudflare shipped "Fail Small" — a two-quarter rewrite of how the edge handles its own mistakes.

blog.cloudflare.com / Jeremy Hartman / May 1, 2026

After two global outages in November and December 2025 — one caused by a Bot Management ML classifier that shipped without a graceful fallback, the other by a single invalid global configuration flag — Cloudflare put the company into what it called "Code Orange" and spent two quarters re-engineering the blast radius of its own changes. The May 1 wrap-up by Jeremy Hartman is the most useful internal-discipline writeup the company has published in years, and it reads like a manifesto for every team running a large fleet. The centrepiece is Snapstone, a system that promotes configuration changes the same way software releases get promoted — progressive cohorts, real-time health checks, automated rollback before customer impact. Workers runtime now ships to free-tier customers first; free-tier traffic is the canary, not a coupon tier. Critical services were redesigned with explicit failure modes — "fail open," "fail close," and the genuinely useful "fail stale" (keep the last known good config when the new one is unreadable, rather than crashing). The Engineering Codex codifies rules pulled directly from postmortems — including the now-famous "do not use .unwrap() outside of tests and build.rs" — and an AI reviewer enforces them on every merge. Two hundred engineers ran company-wide incident drills. Eighteen critical services got backup authorization pathways for when normal control planes are themselves the failure. The architectural lesson is not new — limit blast radius, fail gracefully — but the implementation discipline is unusually concrete. This is the post to send to anyone on your team who thinks "resilience" is a virtue rather than a budget line.

"Snapstone allows teams to dynamically define any unit of configuration that needs health mediation." — Jeremy Hartman, Cloudflare

50+

Workers runtime deploys · 7 days

200+

Engineers in company-wide drill

18

Backup authorization pathways

100%

AI-enforced Codex coverage

Read it · blog.cloudflare.com/code-orange-fail-small-complete