1. DeepSeek V4 drops as the largest open-weight model ever β 1.6T parameter MoE (49B active), MIT licensed, priced at 10-27% of comparable frontier models. 1M-token context at 10% of V3.2's KV cache. Self-assessed as 3-6 months behind GPT-5.4/Gemini 3.1 Pro, but the pricing disruption is seismic.
2. Anthropic enters its "Mythos" moment β Claude Mythos (claimed most powerful model, withheld from public release for safety), Automated Alignment Researchers (Claude autonomously doing alignment research, achieving 0.97 PGR vs 0.23 human baseline), Project Deal (Claude negotiating in a marketplace), and a new monthly Economic Index Survey of 81K Claude users. Anthropic is dominating the narrative across capability, safety, and economic research simultaneously.
3. The open vs. closed gap is fracturing into specialized domains β Nathan Lambert's analysis reveals the gap is no longer a single number. Closed labs dominate agentic coding via online RL from user feedback; open models excel at repetitive automation. Chinese labs lead open-weight size (DeepSeek V4, Qwen3.6-27B) but face funding constraints. The "distillation panic" may be overblown β RL environments are the new moat.
4. GPT-5.5 arrives with a new prompting paradigm β OpenAI recommends starting from scratch with prompts rather than migrating from earlier GPT versions. GPT-5.4 in Codex is described by Lambert as the first OpenAI agent "that feels like it can do a lot of random things." The agent wars are now a four-way race between Anthropic (Claude Code), OpenAI (Codex), Cursor (potential $60B SpaceX deal), and Google (Gemini 3).
| DeepSeek V4 β almost on the frontier, a fraction of the price | Apr 24 | Major: 1.6T Pro / 284B Flash MoE, MIT license. Flash: $0.14/M input (cheaper than GPT-5.4 Nano). Pricing table shows 10-20x cost advantage vs Opus/Sonnet. |
| Reading today's open-closed performance gap | Apr 20 | Deep analysis: Benchmark scores don't capture real-world agentic performance. Gemini 3 has incredible benchmarks but "remarkable irrelevance" in agent deployment. RL environments are the new moat β data is becoming as capital-intensive as chip fabs. |
| My bets on open models, mid-2026 | Apr 15 | Key bets: Closed models didn't widen the gap in H2 2025. Chinese labs focus more on benchmark scores than robustness. Online RL from user feedback is the first clear technical advantage for closed labs. Open models will dominate repetitive automation. Chinese labs face funding difficulties by end of 2026. |
| Automated Alignment Researchers | Apr 14 | Major: 9 copies of Claude Opus 4.6 autonomously performed alignment research. Achieved 0.97 PGR (vs 0.23 human baseline) at $18K total compute cost. Demonstrates AI contributing to its own alignment β a milestone for scalable oversight. |
β’ Last post: Dec 16, 2024 (about LLM writing experiment)
β’ Blog has no RSS feed; requires manual checking
| Post | Date | Significance |
|---|---|---|
| [GPT-5.5 prompting guide](https://simonwillison.net/2026/Apr/25/gpt-5-5-prompting-guide/) | Apr 25 | OpenAI releases GPT-5.5 prompting guidance; recommends starting from scratch rather than migrating old prompts |
| [The people do not yearn for automation](https://simonwillison.net/2026/Apr/24/the-people-do-not-yearn-for-automation/) | Apr 24 | Links Nilay Patel's essay on "software brain" β the gap between AI builders and general public. Highlights growing AI backlash sentiment despite ChatGPT usage growth. |
| [Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model](https://simonwillison.net/2026/Apr/22/qwen36-27b/) | Apr 22 | 27B dense model beats previous 397B MoE across all coding benchmarks. 55.6GB vs 807GB. Runs locally on consumer hardware at 25 tok/s. |
| [Is Claude Code going to cost $100/month?](https://simonwillison.net/2026/Apr/22/claude-code-confusion/) | Apr 22 | Pricing confusion around Claude Code vs. Codex plans |
| [Changes between Claude Opus 4.6 and 4.7 system prompts](https://simonwillison.net/2026/Apr/18/opus-system-prompt/) | Apr 18 | Tracked system prompt changes β valuable for prompt engineers |
| [A pelican for GPT-5.5 via semi-official Codex backdoor](https://simonwillison.net/2026/Apr/23/gpt-5-5/) | Apr 23 | Accessing GPT-5.5 through Codex API before official release |
| [An update on recent Claude Code quality reports](https://simonwillison.net/2026/Apr/24/recent-claude-code-quality-reports/) | Apr 24 | Responding to concerns about Claude Code output quality |
| Post | Date | Significance |
| ------ | ------ | ------------- |
| [Claude Mythos and misguided open-weight fearmongering](https://www.interconnects.ai/p/claude-mythos-and-misguided-open) | Apr 9 | Defends open-weight ecosystem against post-Mythos backlash. Argues the 6-18 month delay between closed and open capabilities is a "blessing" for safety. Estimates Mythos as ~2x larger than Opus 4.6 (6-10T parameters), 5x pricing driven by both size and inference-time compute. |
| [GPT 5.4 is a big step for Codex](https://www.interconnects.ai/p/gpt-54-is-a-big-step-for-codex) | Mar 18 | GPT 5.4 is the first OpenAI agent without "death by a thousand cuts." Claude still has more "charm"; GPT 5.4 is mechanical and precise. Different philosophies: Claude models intent, GPT follows instructions literally. Predicts agent UX will evolve to look like Slack (multi-agent coordination). |
| [The inevitable need for an open model consortium](https://www.interconnects.ai/p/the-inevitable-need-for-an-open-model) | Apr 11 | Proposes formal consortium structure for open model development |
| [Lossy self-improvement](https://www.interconnects.ai/p/lossy-self-improvement) | Mar 22 | Analyzes limits of self-improvement loops in LLMs |
| [Gemma 4 and what makes an open model succeed](https://www.interconnects.ai/p/gemma-4-and-what-makes-an-open-model) | Apr 3 | Google's Gemma 4 as case study for open model strategy |
| Post | Date | Significance |
| ------ | ------ | ------------- |
| [He Came, He Saw, He Cooked](https://stratechery.com/2026/he-came-he-saw-he-cooked/) | Apr 24 | Weekly roundup: Tim Cook stepping down (end of an era), SpaceX-Cursor deal ($60B option), US-China decoupling |
| [An Interview with Google Cloud CEO Thomas Kurian About the Agentic Moment](https://stratechery.com/2026/an-interview-with-google-cloud-ceo-thomas-kurian-about-the-agentic-moment/) | Apr 23 | Google's enterprise agent strategy |
| [OpenAI's Memos, Frontier, Amazon and Anthropic](https://stratechery.com/2026/openais-memos-frontier-amazon-and-anthropic/) | Apr 14 | OpenAI internal memo about taking on Anthropic in enterprise; Amazon-Anthropic relationship |
| [Anthropic's New TPU Deal, Computing Crunch, The Anthropic-Google Alliance](https://stratechery.com/2026/anthropics-new-tpu-deal-anthropics-computing-crunch-the-anthropic-google-alliance/) | Apr 14 | Anthropic's compute strategy with Google TPUs |
| [Anthropic's New Model, The Mythos Wolf, Glasswing and Alignment](https://stratechery.com/2026/anthropics-new-model-the-mythos-wolf-glasswing-and-alignment/) | Apr 10 | Mythos is real β the "wolf" finally came. But if Anthropic is right, that's deeply concerning. |
| [TSMC Earnings, New N3 Fabs, The Nvidia Ramp](https://stratechery.com/2026/tsmc-earnings-new-n3-fabs-the-nvidia-ramp/) | Apr 20 | TSMC leadership not fully bought into AI growth story |
| [OpenAI Buys TBPN, Tech and the Token Tsunami](https://stratechery.com/2026/openai-buys-tbpn-tech-and-the-token-tsunami/) | Apr 10 | OpenAI's acquisition of podcast network β "makes no sense" |
| Publication | Date | Significance |
| ------------- | ------ | ------------- |
| [Project Deal](https://www.anthropic.com/features/project-deal) | Apr 24 | Claude autonomously buying, selling, and negotiating in an employee marketplace. Real-world multi-agent economic simulation. |
| [Anthropic Economic Index Survey](https://www.anthropic.com/research/economic-index-survey-announcement) | Apr 22 | New monthly survey of Claude users on AI's economic impact. Complements the 81K user study. |
| [What 81,000 people told us about the economics of AI](https://www.anthropic.com/research/81k-economics) | Apr 22 | Largest qualitative study of AI economic impact. Connects user concerns with Claude usage data. |
| [Trustworthy agents in practice](https://www.anthropic.com/research/trustworthy-agents) | Apr 9 | Framework for ensuring AI agent trustworthiness as agent usage explodes |
| [Emotion concepts and their function in a large language model](https://www.anthropic.com/research/emotion-concepts-function) | Apr 2 | Interpretability team investigates why LLMs act like they have emotions |
| Post | Date | Significance |
| ------ | ------ | ------------- |
| [Introducing Muse, Spark, MSL](https://ai.meta.com/blog/introducing-muse-spark-msl/) | Apr 8 | New Meta AI framework |
| [Scaling How We Build and Test Our Most Advanced AI](https://ai.meta.com/blog/scaling-how-we-build-test-advanced-ai/) | Apr 8 | Meta's infrastructure scaling |
| [SAM 3.1: Faster Video Detection with Multiplexing](https://ai.meta.com/blog/segment-anything-model-3/) | Mar 27 | Real-time video segmentation |
| [Four MTIA Chips in Two Years](https://ai.meta.com/blog/meta-mtia-scale-ai-chips-for-billions/) | Mar 11 | Meta's custom AI silicon progress |
| [TRIBE v2: Predictive Brain Foundation Model](https://ai.meta.com/blog/tribe-v2-brain-predictive-foundation-model/) | Mar 26 | Neuroscience-AI crossover |
| [Canopy Height Maps v2](https://ai.meta.com/blog/world-resources-institute-dino-canopy-height-maps-v2/) | Mar 10 | Environmental AI applications |
No recent papers found by tracked authors (Karpathy, LeCun, Chollet, Liang, Finn, Levine, Abbeel, Sutskever β no arXiv submissions in the last week for these authors under their standard arXiv names).
Anthropic, OpenAI, cursor/SpaceX, and Google are in a four-way battle for developer mindshare. Lambert observes that GPT-5.4 in Codex is finally competitive with Claude Code after months of frustration. The key battlegrounds:
- Pricing confusion: Claude Code pricing uncertainty ($100/mo rumors) vs Codex's clear fast-mode subscription
- Model personality: Claude has "charm and warmth"; GPT is "meticulous and cold" β different user segments prefer different experiences
- Multi-agent futures: Lambert predicts UX will evolve to look like Slack for multi-agent coordination
DeepSeek V4's pricing is the biggest story: 10-20x cheaper than Anthropic/OpenAI frontier models. Qwen3.6-27B shows you can match a 397B MoE with a well-trained 27B dense model. Lambert's thesis: open models will dominate repetitive automation while closed models lead at knowledge work assistance where robustness and broad capability matter more than cost.
Claude Mythos triggered a new wave of anti-open-weight sentiment. Lambert's counterargument: we've been here before (GPT-2 in 2019, GPT-4 in 2023). The 6-18 month delay between closed and open is a feature, not a bug. Chinese labs can "fast-follow" but regulation of distillation or RL environments could reshape the landscape.
Anthropic's Automated Alignment Researchers paper is a milestone: Claude Opus 4.6 autonomously generated alignment research ideas, ran experiments, and achieved 4x the human baseline. At $18K total cost, this massively reduces the cost of alignment research. The implications: if AI can do alignment research, the "alignment tax" argument (that safety slows capabilities) weakens, and the timeline to superintelligence alignment solutions shortens.
Anthropic's Economic Index Survey and the 81K-user study represent a shift from speculation to data. Monthly tracking of how AI changes work, hiring, and productivity expectations. This is the most ambitious attempt yet to measure AI's economic impact at scale.
- Lambert: The 6-18 month gap protects us. Past scares (GPT-2, GPT-4) didn't materialize. Open-weight models are net positive for safety monitoring.
- Critics (McKay Wrigley, others on X): Claude Mythos-level cyber capabilities in open weights would be catastrophic. This time is different because cyber is tangible, not hypothetical.
- Reality check: Neither side has full info on Mythos's actual capabilities.
- Lambert's heretical claim: "I'm at a relative minimum in my personal confidence in benchmarks." Gemini 3 scores incredibly but is irrelevant in real agent deployment.
- Implication: The entire open-vs-closed leaderboard discussion may be misleading. Benchmarks need to capture speed, cost, ease-of-use, and robustness β not just correctness.
- Ben Thompson's framing: "The part of the 'Boy Cries Wolf' myth everyone forgets is that the wolf did come in the end." Suggests Mythos's withholding may be legitimate.
- Skeptics point to: Anthropic has a history of safety-posturing that coincidentally aligns with their competitive interests. Mythos's 5x pricing makes it commercially available (at a price) β which undercuts the "too dangerous to release" narrative.
- Lambert's estimate: Mythos is ~2x Opus 4.6 in parameters, served inefficiently. The 5x price reflects inference cost, not just capability.
- Thompson: OpenAI buying TBPN "makes no sense." Internal memos show scramble to compete with Anthropic in enterprise. The Sam Altman New Yorker profile reveals a leader surrounded by questions.
- Counterpoint (from product evidence): GPT-5.4/5.5 releases are hitting well, Codex is improving fast, and the API pricing is competitive. The product execution may be stronger than the narrative suggests.
1. Andrej Karpathy β No blog posts since Dec 2024. His active presence on X/Twitter is not captured by our sources. No recent arXiv papers. Consider adding direct social media scraping for high-signal but blog-silent figures.
2. Yann LeCun, FranΓ§ois Chollet, Andrew Ng, Fei-Fei Li, Geoffrey Hinton β No significant new blog posts, papers, or public statements captured this week. This may be because they're primarily active on X/Twitter rather than blogs.
3. Ilya Sutskever (SSI) β No public output detected. Safe Superintelligence Inc. remains opaque.
4. Elon Musk / xAI β No public Grok announcements captured. The SpaceX-Cursor deal ($60B) is a Stratechery story but not directly xAI.
5. Google AI β While Thomas Kurian was interviewed on Stratechery, the Google AI blog had no new posts detected (feed not configured/accessible).
6. DeepMind, OpenAI, Stability AI blogs β RSS feeds not configured (sites use JavaScript-heavy rendering that resists feed auto-discovery). Need manual scraping or explicit feed URLs.
1. Watch for: Anthropic releasing more details about Claude Mythos's actual capabilities (or lack thereof). The gap between narrative and evidence may widen.
2. Watch for: Chinese open-weight lab funding difficulties. Lambert predicts funding crunch by end of 2026 β early signs would be capability trajectory divergence from frontier.
3. Watch for: Online RL from user feedback as the next capability moat. Cursor's real-time RL for Composer is the proof of concept. If Anthropic/OpenAI successfully leverage their massive user bases for RL training data, the open-weight gap will widen.
4. Watch for: GPT-5.5 adoption patterns. If developers follow OpenAI's advice to "start from scratch" on prompts, prompt engineering as a craft may need to be reinvented.
5. Watch for: The SpaceX-Cursor deal. If it closes at $60B, it creates a new vertically integrated AI player with unique compute resources (Starlink data centers?).
*Generated: 2026-04-25 08:14 UTC | Sources: blogwatcher-cli (60 new articles scanned), direct source extraction, arXiv API*