Google rewrites the cap table at Anthropic, OpenAI ships open weights for the first time in two years, Cognition courts a $25B mark, and Cloudflare publishes the most candid LLM-in-CI deep dive yet — twelve spreads on the day capital and infrastructure both moved.
Bloomberg confirmed Friday that Alphabet plans to invest up to $40B in Anthropic — $10B upfront, with another $30B unlocked against performance milestones. It’s the largest single check ever written into an AI lab, and it lands the same week Anthropic shipped its Claude Code postmortem and Google rolled Gemini 3.1 Ultra to general availability. The two companies are no longer hedging — they’re paired.
Vercel disclosed Thursday that it has identified a second batch of customer accounts showing signs of compromise — separate from, but in the wake of, the April Context.ai OAuth supply-chain incident. The original entry vector dates back to February: Lumma Stealer malware hit Context.ai, the attackers harvested Google Workspace OAuth tokens, and used them to pivot into Vercel’s internal systems. Two months of dwell time. Non-sensitive env vars — API keys, signing keys, DB credentials — were exposed when not explicitly flagged sensitive. If you ship serverless on Vercel, today is a credential-rotation day.
Read the bulletin →For the first time since the original GPT-2 release, OpenAI has dropped open-weight models under a license you can actually use commercially: gpt-oss-120b and gpt-oss-20b, both Apache 2.0. The 20B variant is tuned for consumer hardware; the 120B targets a single H100 with offloading. Strong reasoning, native tool use, and a stated benchmark lead over similarly-sized open models. After two years of Llama and DeepSeek setting the open frontier, the incumbent has rejoined the table.
Read the announcement →A doubled mark in a single quarter, while category leaders consolidate around agents-in-CI.
Cognition AI — the company behind Devin, the autonomous software-engineer agent that demoed and overpromised in equal measure last year — is now in talks with investors for a round that would land its valuation at $25B, more than doubling its previous mark. The pitch has shifted: less “Devin replaces engineers,” more “Devin runs in your CI alongside humans.”
The valuation matters less than the signal. After Cursor’s $2B raise and Anthropic’s new $40B tranche, the market is putting roughly $70B into the “agentic coding” layer of the stack. Either the productivity gains land, or the vintage of 2026 funds the messiest AI correction yet. Place your bets.
Three weeks after the Cursor 3 ship — parallel agents, Design Mode, Composer 2 — the team rolled an end-of-week update focused on the boring middle: actually shipping bugs that don’t reproduce. The new /debug command tells the agent to generate hypotheses, drop log statements at runtime, gather evidence from the running process, and only then propose a targeted fix. CLI parity continues to close on the IDE.
Microsoft drops a seven-package toolkit mapping agent telemetry to EU AI Act, HIPAA, and SOC2.
Meta is preparing roughly 8,000 cuts; Microsoft is offering buyouts to about 7% of its U.S. workforce. Both are simultaneously expanding capex on AI infrastructure and AI-focused hires. The pattern is clear and not subtle — the headcount that funded the cloud era is being recycled into the GPU era. Senior engineers in non-AI orgs should read this as a signal about where org charts are headed, not just a quarterly cost line.
Ahead of the U.S. midterms, Anthropic shipped both the safeguards and — more interestingly — the methodology and dataset behind its political-neutrality scoring. Two recent Claude models scored 95% and 96% on its in-house neutrality benchmark. Whether the test is rigorous enough is a fair debate. What’s harder to argue with is that the lab is showing its work.
“Showing the eval set is the part that lets you actually argue with the score.”
Read the writeup →Gemini 3.1 Ultra is now generally available at the 2M-token context that Google previewed in March. The pitch is native multimodality without transcription middlemen — text, image, audio, and video stream into the same context window, no preprocessing required. On benchmarks it shares the top of the table with GPT-5.4 Pro at 57 on the Artificial Analysis Intelligence Index. For codebase-wide refactors it’s now genuinely competitive with Claude on context capacity.
Read the launch post →Rust 1.95.0 shipped this month, and the routine point release reads as a quiet victory lap. The language now powers parts of the Linux kernel, Firefox’s rendering engine, and Discord’s backend in production — three different organizations with three very different risk profiles all relying on the same toolchain. Async closures stabilized in the prior release made async Rust dramatically more ergonomic; the certification program for regulated industries lands the boring final mile.
Add it to the “languages you can pick without a memo” list. The decade-long argument is over.
Read the release notes →Five stories ranked highest on HN this morning, with a brief editor’s note. Anthropic’s $40B headline appeared at #2 but is already this issue’s lead, so we slide one rung down.
Jeff Geerling benchmarks a new generation of bus-powered USB-C 10 GbE NICs that finally don’t double as space heaters. Sub-$80, smaller than a credit card, and cool enough to leave clipped to a laptop while sustaining line rate. The home-lab and on-set-video crowds were waiting for exactly this part to commoditize.
Kevin Lynagh on the specific mode of failure where you’re too good at seeing how things could be — and that competence becomes the enemy of shipping. The structural-diffing bit is especially sharp: it’s when you keep rewriting in your head rather than committing what you have. Worth reading the morning before any planning meeting.
The author noticed open port 22 on their RØDECaster Duo, logged in over the LAN, and found a full embedded Linux box behind a music gadget. It’s a great write-up about modern consumer hardware quietly being root-by-default — and a reminder that “just an audio interface” is increasingly a misnomer.
A demo that swaps a real IBM quantum back-end with a dummy that just returns randomness from /dev/urandom — and shows the noisy quantum hardware is, on many small benchmarks, statistically indistinguishable. It’s a pointed comment on quantum’s current signal-to-noise ratio, not an indictment of the field.
An affectionate, slightly prickly defense of plain text as the format that has outlasted every “rich” replacement. The piece reads less like nostalgia and more like an inventory: what survives on a 30-year horizon, and what doesn’t. Useful framing if you’re picking a format to bet your future tooling on.
Cloudflare didn’t bolt a single LLM onto its CI and call it AI review. Skidmore’s deep-dive walks through a deliberately decomposed system built on OpenCode: up to seven specialized reviewer agents — security, performance, documentation, types, tests, ergonomics, dependencies — each given a tight prompt about what to flag and what to ignore. A coordinator agent dedupes their findings and makes the final approve/reject call. The piece is unusually candid about the production economics: 131,246 reviews in 30 days across 48,095 merge requests, median completion time of 3 minutes 39 seconds, and an honest cost ledger that runs $0.20 for typo fixes and $1.68 for complex refactors. Risk-tiered prompting is treated as a first-class lever, not an afterthought. The system uses prompt caching aggressively — an 85.7% cache hit rate saves “five figures monthly” — and ships circuit breakers and provider-failback chains so that a single LLM-vendor outage doesn’t stall every CI pipeline at Cloudflare. Engineers used the “break glass” override to bypass the agent 0.6% of the time, which is the most useful single number in the entire piece: it’s the empirical false-positive rate, and it’s tiny.
“Telling an LLM what not to do is where the actual prompt engineering value resides.” — Ryan Skidmore, Cloudflare