Morning Edition Vol. I · No. 14 Tuesday · May 5, 2026 For Aziz

The Memory Persists.

Twelve spreads on AI getting stickier. Anthropic, Goldman, and Blackstone form a $1.5B services firm to embed Claude inside mid-market companies; Memory and Security ship in public beta; Cerebras files for a $26.6B Nasdaq listing on the back of a $20B OpenAI compute pact; and Cloudflare's new inference engine splits prefill from decode to fit a trillion-parameter model on eight GPUs. Architecture anchor: how OpenAI's voice team shrank WebRTC's public footprint to a thin Go transceiver — and refused to ship an SFU.

Issue
No. 14
Spreads
12
For You stamps
Six
Window
Last 72 hours
01
The Consulting Beat FOR YOU

Anthropic, Goldman and Blackstone are starting an AI services firm. They want McKinsey's job.

On May 4 Anthropic announced a new enterprise services company with Blackstone, Hellman & Friedman, and Goldman Sachs, backed by ~$1.5B in committed capital. Anthropic, Blackstone, and H&F each contribute around $300M; Goldman ~$150M; Apollo, General Atlantic, Leonard Green, GIC, and Sequoia round out the consortium. The mandate: embed Anthropic engineers and Claude directly into the operating cores of mid-sized companies — explicitly a competitor to the world's largest consulting firms.

Read it as the second domino, not the first. OpenAI is reportedly building the near-identical structure with TPG, Brookfield, Advent, and Bain. A bake-off between two AI services firms — each with a model lab as the prime contractor and PE money as the carry — is now the shape of corporate AI in 2026. If you architect for these companies, your future stakeholder isn't a CIO. It's a partner-track engineer who has equity in a JV.

~/markets/ipo $
02
The Silicon Beat

$ cerebras --price-range 115..125 --shares 28M --target $26.6B

> Cerebras filed updated S-1 on May 4 — selling 28M shares at $115–$125 to raise $3.5B at a $26.6B valuation. Q4 revenue: $510M, +76% YoY. Books are reportedly oversubscribed at $10B against the $3.5B offer; bankers expect to price above range.

> The OpenAI tie is the real story: a January 2026 Master Relationship Agreement worth $20B+ over three years for 750MW of Cerebras compute (250MW/yr through 2028), plus a $1B 6% loan from OpenAI to fund Cerebras' data-center buildout, plus warrants for 33.4M Class N shares. Translation: OpenAI is both customer and equity holder in the chipmaker that ships its inference.

> If Nvidia's moat is scale and CUDA, the bet here is that wafer-scale + a single committed hyperscaler is enough to carve out a meaningful inference niche. Mid-May listing.

03
The Agent Beat FOR YOU

Memory, in public beta, lets agents learn across sessions — and share what they learned.

Anthropic shipped Memory to public beta for Claude Managed Agents. Agents can now persist learnings across sessions and pass those learnings to other agents in the same fleet. The notable design choice: memories are stored as files the developer can read, edit, and delete — a deliberate rejection of the "opaque vector store" paradigm that has dominated agent memory for two years.

For anyone running long-horizon agents, this is the missing primitive. Until now you wrote your own scratchpad layer, your own dedupe, your own promotion rules from short-term to long-term. The trade-off lands inverted: instead of trusting the model's recall, you trust the filesystem and audit it. That is the first agent-memory story in a while that an enterprise security team can actually approve.

PUBLIC BETA
04
The Security Beat FOR YOU

Claude Security ships: scheduled scans, proposed fixes, and Opus 4.7 looking at your repo.

Claude Security entered public beta for Enterprise customers this week — code vulnerability scanning with proposed fixes generated by Opus 4.7, plus scheduled scans, targeted scans, triage tracking, and exports to whatever ticket system you already pay for. It is not a Snyk replacement; it is a model-generated review layer on top of your existing scanners.

The interesting wrinkle: "proposed fixes" means a model is now writing PRs against your security findings. The reviewer-in-the-loop pattern that emerged for refactors is now arriving at the security boundary, where the cost of a bad merge is meaningfully higher. Expect the next twelve months of platform debate to be about exactly which classes of finding get a model-authored fix and which still demand a human author.

05
The IDE Beat FOR YOU

VS Code 1.116 lands. The Copilot CLI now answers your phone.

The May VS Code release ships agent debug logs, terminal agent tools, Copilot CLI thinking-effort controls, and built-in GitHub Copilot — the biggest agent-workflow drop since the editor first absorbed chat in 2024.

The headline feature is Remote Control for Copilot CLI. A long-running CLI session keeps churning on your laptop while you check progress, approve actions, and steer the work from github.com or the GitHub mobile app. The primitive matters: agents are no longer tethered to the editor window that started them.

Token economics shifted as well. Repeated context now bills at roughly 10× lower for Anthropic models in Copilot — a quiet but consequential change for anyone running multi-turn agent loops where the same project tree gets re-read a dozen times per session.

Worth noting: GitHub announced on April 27 that Copilot moves to usage-based billing on June 1. The next month is the cheap window to learn how your team's actual token shape maps onto the new meter.

06
The Local-AI Beat FOR YOU

Anaconda Desktop, in public beta, finally puts model discovery, local inference, and conda envs in one window.

Anaconda has been the obvious place to put a local LLM workbench for two years and has not. That changed with the Anaconda Desktop public beta — a single application that unifies model discovery, local inference, and conda environment management. The pitch is unsexy and exactly right: stop juggling Ollama, llama.cpp, lm-studio, and whatever you used to spin up the Python kernel that calls them. The one tool that already owns your scientific Python environment now also owns the model.

For dev teams whose ML work is half "is the right Python in scope" and half "is the right model loaded," consolidating both into a single installable should remove a meaningful percentage of meeting-time-spent-debugging-environments. If it nails offline use and reproducible env+model bundles, it becomes the path of least resistance for prototyping anything that touches a local LLM.

07
The Inference Beat FOR YOU

Cloudflare's new engine fits Kimi K2.5 — a trillion-parameter model — on eight H100s.

22%
weight reduction with no accuracy loss · "Unweight" compression in production

Cloudflare published the architecture of its production LLM stack: a custom inference engine called Infire that splits each request across two hardware-optimized stages — prefill (compute-bound input processing) and decode (memory-bound generation) — and a compression system, Unweight, hitting 15–22% weight reduction without measurable accuracy loss.

Concrete deployments: Llama 4 Scout running on two H200s; Kimi K2.5 (560GB, 1T+ parameters) running on eight H100s while still leaving adequate KV cache headroom. Multi-GPU is handled with pipeline + tensor parallelism designed specifically for short startup time, because cold-start matters at edge scale.

Source InfoQ
08
The Quantum Beat

Oxford pulls off "quadsqueezing": a fourth-order quantum effect, on a single trapped ion.

On May 1 the University of Oxford reported the first demonstration of quadsqueezing, a fourth-order quantum interaction generated by carefully engineering progressively more complex forms of squeezing on a single trapped ion. Squeezing — second-order — has been a staple of quantum sensing for decades. Third-order has been demonstrated. Fourth-order had not, until now.

Why it matters in plain language: each higher-order interaction unlocks a richer toolkit for engineering quantum states. The Oxford result is the kind of foundational experimental capability that quietly seeds the next decade of quantum sensing, simulation, and computing — without a press release saying so.

Source ScienceDaily
09
The Biology Beat

UChicago turned a protein found in living cells into a working qubit.

Researchers at UChicago PME demonstrated a biological qubit — a protein, native to living cells, configured to function as the foundation of a quantum bit. The result is novel because most quantum platforms (trapped ions, superconducting circuits, nitrogen-vacancy centers) require exotic substrates and conditions; this one runs in the kind of biochemical environment a cell already provides.

Practical consequence: a quantum sensor that can sit inside a cell and detect minute chemical or magnetic shifts, rather than imaging the cell from outside. If the platform generalises, it changes what we can ask cells about — at a resolution biology has not previously had access to. Quantum computing is not the headline. Quantum biosensing is.

Source UChicago News
10
The Networking-Of-Atoms Beat

A photon's quantum state was teleported between two independent quantum dots — over 270 metres of open air.

Two devices that have never been entangled. One photon's state, intact, on the other side.

Researchers reported teleporting a photon's quantum state between two independent quantum dots across a 270-metre open-air free-space link. Independence matters: until now, most teleportation demos relied on entangled-source pairs from the same lab apparatus. This one shows the trick working between separately fabricated emitters — the precondition for a quantum network.

The thing to actually internalise: 270m of open air. Atmospheric noise is the hard part of free-space quantum links, and getting fidelity over a useful distance with non-paired devices is the gap between "lab demo" and "infrastructure." The next benchmark will be doing it across a city.

Hacker News · last 24 hours
Top five by rank, with editor's notes.
01

Bun is being ported from Zig to Rust.

▲ 546 · 381 comments · May 4

The Bun team merged a "Phase-A porting guide" into the repo on May 4 — not a rewrite announcement, a porting guide, with batch tooling alongside it. After years of being the loudest argument for Zig in production, Bun is admitting the language's tooling and ecosystem aren't where they need them. The HN thread is the better read than the commit: half the comments are Rust converts, half are "this is exactly what we said three years ago."

02

Google Chrome silently installs a 4 GB AI model on your device without consent.

▲ 293 · 296 comments · May 5

The author documents that recent Chrome stable channels download a ~4 GB on-device model (the "Gemini Nano" / Built-In AI bundle) in the background, without an explicit consent prompt, on devices that meet the disk and RAM thresholds. The post is mostly a measurement: where the bytes go, what flag toggles it, why "experimental" features that sit behind no UI are a meaningful policy line. The HN thread is the predictable browser-as-OS argument re-litigated, but with receipts this time.

03

Async Rust never left the MVP state.

▲ 179 · 84 comments · May 5

Tweede Golf argues async Rust still emits bloated state machines that waste binary size and cycles — particularly painful on embedded — and that the standard "trust LLVM" answer is the wrong layer to fix it. Their proposal: do the work in the compiler — eliminate unreachable panic states, skip the state machine entirely for await-free futures, and inline futures to flatten the generated code. Concrete, unromantic, and the kind of post that nudges a language project from "we shipped it" to "we're maintaining it."

04

Empty Screenings — finds AMC movie screenings with few or no tickets sold.

▲ 166 · 131 comments · May 5

A small site that scrapes AMC's public ticketing pages and surfaces showings with zero or near-zero sold seats, near you. It is mostly funny — book the 11pm screening of a six-week-old release and you have your own private theater for $8 — but it is also a tidy demonstration of how much consumer behaviour data is one well-formed query away from a public API. The HN thread is largely "wait, this works for X chain too?"

05

Hand-drawn QR codes, by Seth Larson.

▲ 122 · 22 comments · May 5

Larson drew a working QR code by hand on grid paper — a version-1 code encoding a URL — and confirmed it scans reliably as long as the paper sits flat or hangs vertically. Minor pixel-fill errors did not break it; the format's error correction did its job. It is a delightful one-evening project and a quiet reminder that a lot of the magic in everyday tech is well-designed redundancy, not precision.

Architecture in the Wild · this week's anchor

How OpenAI delivers low-latency voice AI at scale.

OpenAI's Realtime team published the architecture behind their voice product, and the headline is what they refused to build. Most "voice AI at scale" stacks reach for a Selective Forwarding Unit because that is how the videoconferencing world has solved this problem for a decade. OpenAI argues — with receipts — that the SFU model is wrong for point-to-point, latency-sensitive sessions where the "other peer" is a model, not a human. So they built a thin layer instead: a WebRTC transceiver that terminates the client connection at the edge and immediately translates the media and signalling into simpler internal protocols for the inference, transcription, TTS, tool-use, and orchestration services behind it. The transceiver is the only component that owns WebRTC session state — ICE checks, DTLS handshake, SRTP keys, lifecycle — which means the inference layer never has to behave like a WebRTC peer.

"A narrow Go implementation with careful use of SO_REUSEPORT, thread pinning, and low-allocation parsing was sufficient for our workload." — OpenAI engineering, on the transceiver

The unsexy infrastructure win is the small, fixed UDP port surface. WebRTC at scale traditionally exposes thousands of UDP ports per node, which is hostile to Kubernetes, hostile to load balancers, and hostile to anyone trying to write a sane firewall rule. By encoding routing metadata into a protocol-native field, OpenAI gets deterministic first-packet routing — the load balancer can place a session on the right backend before any handshake begins — with a small public UDP footprint that fits cleanly into k8s networking primitives. The ingress can be placed close to users worldwide without reserving large public port ranges.

The architectural lesson generalises beyond voice. When the workload looks superficially like a known problem (real-time media → SFU), the standard answer pulls a heavy stack into your system that exists to solve a problem you do not have (multi-party fanout). The cheaper, more honest design is the one that reads its own constraints — point-to-point, latency-bound, model-on-the-other-end — and builds the thinnest layer that satisfies them. For senior engineers staring at agentic infrastructure in 2026, this is the template: write the transceiver, not the SFU.