πŸ“… 25 April 2026 πŸ“Š 15 signals

AI Thinkers Intelligence Scan β€” 2026-04-25

AI Thinkers Intelligence Scan β€” 2026-04-25

Executive Summary

1. DeepSeek V4 drops as the largest open-weight model ever β€” 1.6T parameter MoE (49B active), MIT licensed, priced at 10-27% of comparable frontier models. 1M-token context at 10% of V3.2's KV cache. Self-assessed as 3-6 months behind GPT-5.4/Gemini 3.1 Pro, but the pricing disruption is seismic.

2. Anthropic enters its "Mythos" moment β€” Claude Mythos (claimed most powerful model, withheld from public release for safety), Automated Alignment Researchers (Claude autonomously doing alignment research, achieving 0.97 PGR vs 0.23 human baseline), Project Deal (Claude negotiating in a marketplace), and a new monthly Economic Index Survey of 81K Claude users. Anthropic is dominating the narrative across capability, safety, and economic research simultaneously.

3. The open vs. closed gap is fracturing into specialized domains β€” Nathan Lambert's analysis reveals the gap is no longer a single number. Closed labs dominate agentic coding via online RL from user feedback; open models excel at repetitive automation. Chinese labs lead open-weight size (DeepSeek V4, Qwen3.6-27B) but face funding constraints. The "distillation panic" may be overblown β€” RL environments are the new moat.

4. GPT-5.5 arrives with a new prompting paradigm β€” OpenAI recommends starting from scratch with prompts rather than migrating from earlier GPT versions. GPT-5.4 in Codex is described by Lambert as the first OpenAI agent "that feels like it can do a lot of random things." The agent wars are now a four-way race between Anthropic (Claude Code), OpenAI (Codex), Cursor (potential $60B SpaceX deal), and Google (Gemini 3).


Signals by Source

Simon Willison (simonwillison.net) β€” Very High Activity

| DeepSeek V4 β€” almost on the frontier, a fraction of the price | Apr 24 | Major: 1.6T Pro / 284B Flash MoE, MIT license. Flash: $0.14/M input (cheaper than GPT-5.4 Nano). Pricing table shows 10-20x cost advantage vs Opus/Sonnet. |

Nathan Lambert / Interconnects (interconnects.ai) β€” Very High Activity

| Reading today's open-closed performance gap | Apr 20 | Deep analysis: Benchmark scores don't capture real-world agentic performance. Gemini 3 has incredible benchmarks but "remarkable irrelevance" in agent deployment. RL environments are the new moat β€” data is becoming as capital-intensive as chip fabs. |

| My bets on open models, mid-2026 | Apr 15 | Key bets: Closed models didn't widen the gap in H2 2025. Chinese labs focus more on benchmark scores than robustness. Online RL from user feedback is the first clear technical advantage for closed labs. Open models will dominate repetitive automation. Chinese labs face funding difficulties by end of 2026. |

Ben Thompson / Stratechery (stratechery.com) β€” Very High Activity

Anthropic Research (anthropic.com/research) β€” Very High Activity

| Automated Alignment Researchers | Apr 14 | Major: 9 copies of Claude Opus 4.6 autonomously performed alignment research. Achieved 0.97 PGR (vs 0.23 human baseline) at $18K total compute cost. Demonstrates AI contributing to its own alignment β€” a milestone for scalable oversight. |

Meta AI Blog (ai.meta.com/blog) β€” Moderate Activity

Andrej Karpathy (karpathy.ai/blog) β€” No new activity

β€’ Last post: Dec 16, 2024 (about LLM writing experiment)

β€’ Blog has no RSS feed; requires manual checking

arXiv β€” Recent Papers (cs.LG, Apr 23 2026)

PostDateSignificance
[GPT-5.5 prompting guide](https://simonwillison.net/2026/Apr/25/gpt-5-5-prompting-guide/)Apr 25OpenAI releases GPT-5.5 prompting guidance; recommends starting from scratch rather than migrating old prompts
[The people do not yearn for automation](https://simonwillison.net/2026/Apr/24/the-people-do-not-yearn-for-automation/)Apr 24Links Nilay Patel's essay on "software brain" β€” the gap between AI builders and general public. Highlights growing AI backlash sentiment despite ChatGPT usage growth.
[Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model](https://simonwillison.net/2026/Apr/22/qwen36-27b/)Apr 2227B dense model beats previous 397B MoE across all coding benchmarks. 55.6GB vs 807GB. Runs locally on consumer hardware at 25 tok/s.
[Is Claude Code going to cost $100/month?](https://simonwillison.net/2026/Apr/22/claude-code-confusion/)Apr 22Pricing confusion around Claude Code vs. Codex plans
[Changes between Claude Opus 4.6 and 4.7 system prompts](https://simonwillison.net/2026/Apr/18/opus-system-prompt/)Apr 18Tracked system prompt changes β€” valuable for prompt engineers
[A pelican for GPT-5.5 via semi-official Codex backdoor](https://simonwillison.net/2026/Apr/23/gpt-5-5/)Apr 23Accessing GPT-5.5 through Codex API before official release
[An update on recent Claude Code quality reports](https://simonwillison.net/2026/Apr/24/recent-claude-code-quality-reports/)Apr 24Responding to concerns about Claude Code output quality
PostDateSignificance
-------------------------
[Claude Mythos and misguided open-weight fearmongering](https://www.interconnects.ai/p/claude-mythos-and-misguided-open)Apr 9Defends open-weight ecosystem against post-Mythos backlash. Argues the 6-18 month delay between closed and open capabilities is a "blessing" for safety. Estimates Mythos as ~2x larger than Opus 4.6 (6-10T parameters), 5x pricing driven by both size and inference-time compute.
[GPT 5.4 is a big step for Codex](https://www.interconnects.ai/p/gpt-54-is-a-big-step-for-codex)Mar 18GPT 5.4 is the first OpenAI agent without "death by a thousand cuts." Claude still has more "charm"; GPT 5.4 is mechanical and precise. Different philosophies: Claude models intent, GPT follows instructions literally. Predicts agent UX will evolve to look like Slack (multi-agent coordination).
[The inevitable need for an open model consortium](https://www.interconnects.ai/p/the-inevitable-need-for-an-open-model)Apr 11Proposes formal consortium structure for open model development
[Lossy self-improvement](https://www.interconnects.ai/p/lossy-self-improvement)Mar 22Analyzes limits of self-improvement loops in LLMs
[Gemma 4 and what makes an open model succeed](https://www.interconnects.ai/p/gemma-4-and-what-makes-an-open-model)Apr 3Google's Gemma 4 as case study for open model strategy
PostDateSignificance
-------------------------
[He Came, He Saw, He Cooked](https://stratechery.com/2026/he-came-he-saw-he-cooked/)Apr 24Weekly roundup: Tim Cook stepping down (end of an era), SpaceX-Cursor deal ($60B option), US-China decoupling
[An Interview with Google Cloud CEO Thomas Kurian About the Agentic Moment](https://stratechery.com/2026/an-interview-with-google-cloud-ceo-thomas-kurian-about-the-agentic-moment/)Apr 23Google's enterprise agent strategy
[OpenAI's Memos, Frontier, Amazon and Anthropic](https://stratechery.com/2026/openais-memos-frontier-amazon-and-anthropic/)Apr 14OpenAI internal memo about taking on Anthropic in enterprise; Amazon-Anthropic relationship
[Anthropic's New TPU Deal, Computing Crunch, The Anthropic-Google Alliance](https://stratechery.com/2026/anthropics-new-tpu-deal-anthropics-computing-crunch-the-anthropic-google-alliance/)Apr 14Anthropic's compute strategy with Google TPUs
[Anthropic's New Model, The Mythos Wolf, Glasswing and Alignment](https://stratechery.com/2026/anthropics-new-model-the-mythos-wolf-glasswing-and-alignment/)Apr 10Mythos is real β€” the "wolf" finally came. But if Anthropic is right, that's deeply concerning.
[TSMC Earnings, New N3 Fabs, The Nvidia Ramp](https://stratechery.com/2026/tsmc-earnings-new-n3-fabs-the-nvidia-ramp/)Apr 20TSMC leadership not fully bought into AI growth story
[OpenAI Buys TBPN, Tech and the Token Tsunami](https://stratechery.com/2026/openai-buys-tbpn-tech-and-the-token-tsunami/)Apr 10OpenAI's acquisition of podcast network β€” "makes no sense"
PublicationDateSignificance
--------------------------------
[Project Deal](https://www.anthropic.com/features/project-deal)Apr 24Claude autonomously buying, selling, and negotiating in an employee marketplace. Real-world multi-agent economic simulation.
[Anthropic Economic Index Survey](https://www.anthropic.com/research/economic-index-survey-announcement)Apr 22New monthly survey of Claude users on AI's economic impact. Complements the 81K user study.
[What 81,000 people told us about the economics of AI](https://www.anthropic.com/research/81k-economics)Apr 22Largest qualitative study of AI economic impact. Connects user concerns with Claude usage data.
[Trustworthy agents in practice](https://www.anthropic.com/research/trustworthy-agents)Apr 9Framework for ensuring AI agent trustworthiness as agent usage explodes
[Emotion concepts and their function in a large language model](https://www.anthropic.com/research/emotion-concepts-function)Apr 2Interpretability team investigates why LLMs act like they have emotions
PostDateSignificance
-------------------------
[Introducing Muse, Spark, MSL](https://ai.meta.com/blog/introducing-muse-spark-msl/)Apr 8New Meta AI framework
[Scaling How We Build and Test Our Most Advanced AI](https://ai.meta.com/blog/scaling-how-we-build-test-advanced-ai/)Apr 8Meta's infrastructure scaling
[SAM 3.1: Faster Video Detection with Multiplexing](https://ai.meta.com/blog/segment-anything-model-3/)Mar 27Real-time video segmentation
[Four MTIA Chips in Two Years](https://ai.meta.com/blog/meta-mtia-scale-ai-chips-for-billions/)Mar 11Meta's custom AI silicon progress
[TRIBE v2: Predictive Brain Foundation Model](https://ai.meta.com/blog/tribe-v2-brain-predictive-foundation-model/)Mar 26Neuroscience-AI crossover
[Canopy Height Maps v2](https://ai.meta.com/blog/world-resources-institute-dino-canopy-height-maps-v2/)Mar 10Environmental AI applications

No recent papers found by tracked authors (Karpathy, LeCun, Chollet, Liang, Finn, Levine, Abbeel, Sutskever β€” no arXiv submissions in the last week for these authors under their standard arXiv names).


Theme Analysis

1. The Agent Wars Intensify

Anthropic, OpenAI, cursor/SpaceX, and Google are in a four-way battle for developer mindshare. Lambert observes that GPT-5.4 in Codex is finally competitive with Claude Code after months of frustration. The key battlegrounds:

- Pricing confusion: Claude Code pricing uncertainty ($100/mo rumors) vs Codex's clear fast-mode subscription

- Model personality: Claude has "charm and warmth"; GPT is "meticulous and cold" β€” different user segments prefer different experiences

- Multi-agent futures: Lambert predicts UX will evolve to look like Slack for multi-agent coordination

2. Open-Weight Models: Cost Disruption Meets Capability Plateau

DeepSeek V4's pricing is the biggest story: 10-20x cheaper than Anthropic/OpenAI frontier models. Qwen3.6-27B shows you can match a 397B MoE with a well-trained 27B dense model. Lambert's thesis: open models will dominate repetitive automation while closed models lead at knowledge work assistance where robustness and broad capability matter more than cost.

3. The "Mythos Backlash" and Open-Weight Regulation Fears

Claude Mythos triggered a new wave of anti-open-weight sentiment. Lambert's counterargument: we've been here before (GPT-2 in 2019, GPT-4 in 2023). The 6-18 month delay between closed and open is a feature, not a bug. Chinese labs can "fast-follow" but regulation of distillation or RL environments could reshape the landscape.

4. AI Alignment Research Is Becoming Automated

Anthropic's Automated Alignment Researchers paper is a milestone: Claude Opus 4.6 autonomously generated alignment research ideas, ran experiments, and achieved 4x the human baseline. At $18K total cost, this massively reduces the cost of alignment research. The implications: if AI can do alignment research, the "alignment tax" argument (that safety slows capabilities) weakens, and the timeline to superintelligence alignment solutions shortens.

5. Economic Impact Moving to Quantitative Measurement

Anthropic's Economic Index Survey and the 81K-user study represent a shift from speculation to data. Monthly tracking of how AI changes work, hiring, and productivity expectations. This is the most ambitious attempt yet to measure AI's economic impact at scale.


Conflict/Debate Points

Open-Weight Safety: Lambert vs. "Cybersecurity Doomers"

- Lambert: The 6-18 month gap protects us. Past scares (GPT-2, GPT-4) didn't materialize. Open-weight models are net positive for safety monitoring.

- Critics (McKay Wrigley, others on X): Claude Mythos-level cyber capabilities in open weights would be catastrophic. This time is different because cyber is tangible, not hypothetical.

- Reality check: Neither side has full info on Mythos's actual capabilities.

Model Measurement: Lambert vs. Benchmark Consensus

- Lambert's heretical claim: "I'm at a relative minimum in my personal confidence in benchmarks." Gemini 3 scores incredibly but is irrelevant in real agent deployment.

- Implication: The entire open-vs-closed leaderboard discussion may be misleading. Benchmarks need to capture speed, cost, ease-of-use, and robustness β€” not just correctness.

Anthropic's Mythos: Strategic Move or Genuine Concern?

- Ben Thompson's framing: "The part of the 'Boy Cries Wolf' myth everyone forgets is that the wolf did come in the end." Suggests Mythos's withholding may be legitimate.

- Skeptics point to: Anthropic has a history of safety-posturing that coincidentally aligns with their competitive interests. Mythos's 5x pricing makes it commercially available (at a price) β€” which undercuts the "too dangerous to release" narrative.

- Lambert's estimate: Mythos is ~2x Opus 4.6 in parameters, served inefficiently. The 5x price reflects inference cost, not just capability.

OpenAI's Strategy: Coherent or Chaotic?

- Thompson: OpenAI buying TBPN "makes no sense." Internal memos show scramble to compete with Anthropic in enterprise. The Sam Altman New Yorker profile reveals a leader surrounded by questions.

- Counterpoint (from product evidence): GPT-5.4/5.5 releases are hitting well, Codex is improving fast, and the API pricing is competitive. The product execution may be stronger than the narrative suggests.


Notable Absences

1. Andrej Karpathy β€” No blog posts since Dec 2024. His active presence on X/Twitter is not captured by our sources. No recent arXiv papers. Consider adding direct social media scraping for high-signal but blog-silent figures.

2. Yann LeCun, FranΓ§ois Chollet, Andrew Ng, Fei-Fei Li, Geoffrey Hinton β€” No significant new blog posts, papers, or public statements captured this week. This may be because they're primarily active on X/Twitter rather than blogs.

3. Ilya Sutskever (SSI) β€” No public output detected. Safe Superintelligence Inc. remains opaque.

4. Elon Musk / xAI β€” No public Grok announcements captured. The SpaceX-Cursor deal ($60B) is a Stratechery story but not directly xAI.

5. Google AI β€” While Thomas Kurian was interviewed on Stratechery, the Google AI blog had no new posts detected (feed not configured/accessible).

6. DeepMind, OpenAI, Stability AI blogs β€” RSS feeds not configured (sites use JavaScript-heavy rendering that resists feed auto-discovery). Need manual scraping or explicit feed URLs.


Forward Indicators

1. Watch for: Anthropic releasing more details about Claude Mythos's actual capabilities (or lack thereof). The gap between narrative and evidence may widen.

2. Watch for: Chinese open-weight lab funding difficulties. Lambert predicts funding crunch by end of 2026 β€” early signs would be capability trajectory divergence from frontier.

3. Watch for: Online RL from user feedback as the next capability moat. Cursor's real-time RL for Composer is the proof of concept. If Anthropic/OpenAI successfully leverage their massive user bases for RL training data, the open-weight gap will widen.

4. Watch for: GPT-5.5 adoption patterns. If developers follow OpenAI's advice to "start from scratch" on prompts, prompt engineering as a craft may need to be reinvented.

5. Watch for: The SpaceX-Cursor deal. If it closes at $60B, it creates a new vertically integrated AI player with unique compute resources (Starlink data centers?).


*Generated: 2026-04-25 08:14 UTC | Sources: blogwatcher-cli (60 new articles scanned), direct source extraction, arXiv API*