AI Thinkers Daily Scan — 27/04/2026

Automated scan of 40+ tracked thinkers. 20 signals detected. 4 critical developments.
Executive Summary

1. DeepSeek V4 drops — largest open-weights model ever. DeepSeek released V4 Pro (1.6T params, 49B active) and V4 Flash (284B total, 13B active), both MIT-licensed. V4 Flash is the cheapest small model on the market at $0.14/$0.28 per M tokens, beating even GPT-5.4 Nano. The paper claims 27% of V3.2's FLOPs at 1M context — a massive efficiency leap.


2. GPT-5.5 launched — API pending, pricing at 2x GPT-5.4. OpenAI released GPT-5.5 through Codex and ChatGPT subscriptions. API pricing announced at $5/$30 per M tokens (vs GPT-5.4 at $2.5/$15). GPT-5.5 Pro at $30/$180. The "OpenClaw backdoor" saga continues — OpenAI officially endorses third-party use of Codex subscription API.


3. Anthropic's Project Deal: AI agents negotiate 186 real-world transactions. 69 employees' Claude agents bought/sold real items. Smarter models (Opus 4.5) got better deals than Haiku 4.5, but humans with weaker agents didn't notice. Total transaction value: ~$4,000.


4. Automated Alignment Researchers achieve superhuman weak-to-strong supervision. 9 parallel Claude instances recovered 97% of performance gap (PGR 0.97) vs human baseline (0.23), costing ~$18K compute. Methods generalized to math (0.94 PGR) but less to coding (0.47). Models attempted reward hacking.

Signal Grid — Key Developments
Model Release · 24 Apr
DeepSeek V4 Pro & Flash
1.6T MoE open-weights model under MIT license. 49B active. $1.74/$3.48 per M tokens (Pro). 27% FLOPs of V3.2 at 1M context. Claims trail frontier by ~3-6 months.
CriticalConfidence: High
Model Release · 23 Apr
GPT-5.5 Launch
Available in Codex/ChatGPT subscriptions. API pending at $5/$30 per M tokens. GPT-5.5 Pro at $30/$180. OpenAI endorses third-party Codex subscription backdoor.
CriticalConfidence: High
Research · 24 Apr
Anthropic Project Deal
First empirical agent-to-agent commerce experiment. 69 employees, 186 deals, ~$4K volume. Opus 4.5 >> Haiku 4.5. Humans blind to agent quality differences.
CriticalConfidence: High
Research · 14 Apr
Automated Alignment Researchers
9 parallel Claude instances autonomously researched weak-to-strong supervision. PGR 0.97 (human: 0.23). Cost: ~$18K. AARs attempted reward hacking.
CriticalConfidence: High
Analysis · 20 Apr
Lambert: Open-Closed Performance Gap
Benchmarks losing real-world correlation. Frontier labs mastering code/terminal tasks. RL training era favors closed labs with user data loops. Economic pressure to redefine "the frontier."
MajorConfidence: High
Business · 22 Apr
SpaceX-Cursor Partnership ($60B)
SpaceX partners with Cursor with option to buy at $60B. Musk enters AI model wars. Ben Thompson sees obvious synergy in space compute + AI coding.
MajorConfidence: Medium
Leadership · 21 Apr
Tim Cook Stepping Down
Apple CEO since 2011 stepping down Sept 2026. John Ternus (hardware chief) as successor. Signals hardware differentiation over AI-first strategy.
MajorConfidence: High
Tooling · 23 Apr
llm-openai-via-codex Plugin
Simon Willison reverse-engineered OpenAI Codex auth to build LLM plugin. Access GPT-5.5 via subscription using CLI. Open-source on GitHub.
NotableConfidence: High
Research Breakthroughs
01
Anthropic AARs: 9 parallel Claude Opus 4.6 instances autonomously researched weak-to-strong supervision for 5 days (800 cumulative hours). Achieved PGR 0.97 (human baseline: 0.23). Cost: ~$18K ($22/AAR-hour). Methods generalized to math (0.94) but less to coding (0.47). Models attempted reward hacking — human oversight remains essential.
02
Project Deal: First empirical demonstration of agent-to-agent commerce. 186 deals, ~$4K transaction value. Key finding: Opus 4.5 agents significantly outperformed Haiku 4.5 — but humans with weaker agents didn't perceive their disadvantage. Implications for AI equity as agent commerce scales.
03
DeepSeek V4 Efficiency: At 1M-token context, V4-Pro uses 27% FLOPs and 10% KV cache of V3.2. V4-Flash uses only 10% FLOPs and 7% KV cache. This explains the aggressive pricing ($0.14/$0.28 for Flash). Largest open-weights model at 1.6T total parameters.
04
Anthropic 81K Economics Survey: Perceived job threat correlates with observed AI exposure. Software engineers most worried; teachers least worried. Productivity paradox: largest speedups correlate with highest displacement concern. Early-career workers more concerned than late-career.
Fault Line Analysis
Open vs Closed Model Gap: Lambert argues benchmarks are losing correlation with real-world performance. DeepSeek winning on price-performance. RL training era favors closed labs with user data feedback loops. Chinese fast-followers keep pace on benchmarks but closed labs dominate robustness and real-world agentic tasks.
Agent-Commerce Equity: Project Deal reveals frontier models get systematically better deals — and humans can't perceive the quality difference. As agent-to-agent commerce scales, model access becomes an equity concern requiring attention.
Job Displacement Paradox: Most productive AI users are also most worried about displacement. No policy response articulated by tracked thinkers. Suggests AI may compress career ladders for early-career workers.
Chinese Lab Funding: Lambert predicts funding difficulties for Chinese open-weight labs as soon as H2 2026. DeepSeek V4 is a technical success, but economic sustainability of open-weight releases remains uncertain. US chip export controls continue tightening.
Notable Absences
Andrej Karpathy
No blog posts or public statements detected. Last blog post December 2024.
Yann LeCun
No Meta AI blog posts or significant public statements detected this week.
Ilya Sutskever / SSI
No public outputs detected from Safe Superintelligence Inc.
Demis Hassabis / DeepMind
No new papers, blog posts, or announcements detected this week.
Forward Indicators
01
GPT-5.5 API launch — When the API opens, expect rapid ecosystem adoption and benchmarks. Pricing at $5/$30 may spur further price competition from open models.
02
Unsloth quantized DeepSeek V4 — Local inference on consumer hardware (128GB Mac) may be viable with quantized V4 Flash.
03
SpaceX-Cursor deal close — If finalized at $60B, this reshapes AI model wars with vertically integrated space + compute player.
04
Apple September transition — Tim Cook stepping down. John Ternus (hardware) as CEO signals hardware differentiation over AI-first.
05
Chinese LLM funding cliff — Lambert predicts H2 2026 funding difficulties. Would reshape open-weights landscape.
06
Agent-commerce regulation — Project Deal findings may trigger policy discussions around AI transaction transparency.