3 Major Developments:
Claude Mythos Preview claims to surpass "all but the most skilled humans at finding and exploiting software vulnerabilities." Deployed via Project Glasswing, gated to enterprise customers. Ben Thompson's analysis frames this as simultaneously a genuine capability advance and a strategic compute-allocation move — limiting supply preserves pricing power and prevents distillation by Chinese labs.
DeepSeek V4 (1.6T params MoE, MIT license), Meta Muse Spark (first from Meta Superintelligence Labs), GPT-5.5 (2x pricing, new prompting paradigm), Qwen3.6-27B (27B dense beats 397B MoE predecessor). Four significant announcements in a compressed window signal the pace of release has not slowed.
Anthropic's Automated Alignment Researchers (9 Claude Opus 4.6 instances) achieved 0.97 PGR on weak-to-strong supervision — vs 0.23 PGR baseline for human researchers. $18k total cost vs ~$200/hr for human labor. First concrete demonstration of AI accelerating its own alignment research.
Who published what, where, and how often
| Thinker | Papers | Posts | Key Topic |
|---|---|---|---|
| Anthropic (Amodei) | 1 | 3 | Cybersecurity alignment, compute strategy |
| OpenAI (Altman) | — | 3 | Pricing strategy, compute build-out |
| Nathan Lambert | — | 2 | Open/closed model dynamics |
| Ben Thompson | — | 2 | AI business economics |
| Simon Willison | — | 6 | Hands-on model testing, ecosystem |
| Chelsea Finn | 3 | — | Test-time compute, RL, robotics |
| Meta (LeCun/Zuckerberg) | — | 1 | Muse Spark — multimodal reasoning |
| DeepSeek | — | 1 | V4 open-weight frontier release |
| Qwen/Alibaba | — | 1 | Efficient dense models (27B > 397B) |
Notable absences this cycle: Karpathy (no blog since Dec 2024), Hinton (no Mythos commentary), Sutskever (SSI opaque), Chollet (no ARC-AGI evaluations), Hassabis/DeepMind (quiet), LeCun (no individual statements), Musk/xAI (no Grok news).
"Claude Mythos surpasses all but the most skilled humans at finding and exploiting software vulnerabilities."
"The people do not yearn for automation."
"It's surprising that top closed models did NOT show a growing capability margin over open models."
"The opportunity cost — not the marginal cost — is the real constraint on serving AI at scale."
"We are 3-6 months behind GPT-5.4 and Gemini 3.1 Pro."
"Treat GPT-5.5 as a new model family, not a drop-in replacement."
9 Claude Opus 4.6 instances achieving 0.97 PGR on weak-to-strong supervision (human baseline: 0.23). Total cost: $18k for 800 hours. At $22/hr vs $200+/hr for humans, the economics favor dramatic scaling. Source
Chelsea Finn group. Traces performance gain of test-time scaling back to earlier denoising steps. Same performance with substantially reduced compute. arXiv:2604.19730
Chelsea Finn group. Optimizes for collective accuracy + diversity of reasoning strategies. Improved pass@k coverage via set RL. arXiv:2604.17654
Jointly adapts where compute is spent and how generation is performed. Warm-up phase identifies easy queries; concentrates compute on hard ones. arXiv:2604.21018
33-46.6pp gap between agreement-based and policy-grounded moderation metrics. Could fundamentally change how AI governance is evaluated. arXiv:2604.20972
Physical Intelligence's next-gen robotic foundation model. Steerable control + emergent capabilities across diverse manipulation tasks. arXiv:2604.15483
| Product | Lab | Significance |
|---|---|---|
| Claude Mythos Preview | Anthropic | Frontier model gated to enterprise; cybersecurity focus |
| Claude Opus 4.7 | Anthropic | Updated prompt; Powerpoint/Chrome agents; less verbosity |
| GPT-5.5 | OpenAI | New model family; 2x pricing; Codex-first rollout |
| GPT-5.5 Pro | OpenAI | Ultra-premium: $30/$180 per M tokens |
| DeepSeek V4 Pro/Flash | DeepSeek | MIT license; 1.6T/49B MoE; 1M ctx; $0.14-$3.48/M |
| Meta Muse Spark | Meta | First from Meta Superintelligence Labs; natively multimodal |
| Qwen3.6-27B | Alibaba | 27B dense beats 397B MoE on coding; 25 t/s quantized |
OpenAI: Massive compute infrastructure is the moat. Thompson: Owning demand trumps owning supply. Meta: Consumer distribution + ad monetization is the path. This is the central strategic disagreement in AI right now, yet each lab argues from its own structural position.
Lambert (Apr 15): Closed models did not show growing margin. Lambert (Apr 20): But closed models dominate in robustness/agentic quality — hard to benchmark but commercially critical. DeepSeek: 3-6 months behind. Qwen: 27B > 397B = efficiency gains favor dense models. The gap is stable on benchmarks but diverging on real-world robustness — a measurement crisis.
Anthropic: Industrial-scale illicit distillation. Thompson: Also about protecting pricing power. Lambert: Helps but not determinative. The question is whether enforcement accelerates Chinese self-sufficiency or merely slows it down.
| Hyped | Ignored |
|---|---|
| Mythos capabilities (security risk) | Environmental cost of training Mythos-scale models |
| GPT-5.5 pricing (market strategy) | Accessibility gap from doubling API prices |
| Open-closed model debate | Public resistance to AI automation |
| Alignment research automation | Labor implications for AI safety researchers |
| Chinese distillation | Chinese counter-strategies (indigenous innovation) |
| Compute build-out arms race | Compute allocation for non-frontier applications |