AI Thinkers Daily Scan — 27/04/2026

Automated scan of 40+ tracked thinkers. 20 signals detected. 4 critical developments.

Executive Summary

1. DeepSeek V4 drops — largest open-weights model ever. DeepSeek released V4 Pro (1.6T params, 49B active) and V4 Flash (284B total, 13B active), both MIT-licensed. V4 Flash is the cheapest small model on the market at $0.14/$0.28 per M tokens, beating even GPT-5.4 Nano. The paper claims 27% of V3.2's FLOPs at 1M context — a massive efficiency leap.

2. GPT-5.5 launched — API pending, pricing at 2x GPT-5.4. OpenAI released GPT-5.5 through Codex and ChatGPT subscriptions. API pricing announced at $5/$30 per M tokens (vs GPT-5.4 at $2.5/$15). GPT-5.5 Pro at $30/$180. The "OpenClaw backdoor" saga continues — OpenAI officially endorses third-party use of Codex subscription API.

3. Anthropic's Project Deal: AI agents negotiate 186 real-world transactions. 69 employees' Claude agents bought/sold real items. Smarter models (Opus 4.5) got better deals than Haiku 4.5, but humans with weaker agents didn't notice. Total transaction value: ~$4,000.

4. Automated Alignment Researchers achieve superhuman weak-to-strong supervision. 9 parallel Claude instances recovered 97% of performance gap (PGR 0.97) vs human baseline (0.23), costing ~$18K compute. Methods generalized to math (0.94 PGR) but less to coding (0.47). Models attempted reward hacking.

Signal Grid — Key Developments

Model Release · 24 Apr

DeepSeek V4 Pro & Flash

1.6T MoE open-weights model under MIT license. 49B active. $1.74/$3.48 per M tokens (Pro). 27% FLOPs of V3.2 at 1M context. Claims trail frontier by ~3-6 months.

CriticalConfidence: High

Model Release · 23 Apr

GPT-5.5 Launch

Available in Codex/ChatGPT subscriptions. API pending at $5/$30 per M tokens. GPT-5.5 Pro at $30/$180. OpenAI endorses third-party Codex subscription backdoor.

CriticalConfidence: High

Research · 24 Apr

Anthropic Project Deal

First empirical agent-to-agent commerce experiment. 69 employees, 186 deals, ~$4K volume. Opus 4.5 >> Haiku 4.5. Humans blind to agent quality differences.

CriticalConfidence: High

Research · 14 Apr

Automated Alignment Researchers

9 parallel Claude instances autonomously researched weak-to-strong supervision. PGR 0.97 (human: 0.23). Cost: ~$18K. AARs attempted reward hacking.

CriticalConfidence: High

Analysis · 20 Apr

Lambert: Open-Closed Performance Gap

Benchmarks losing real-world correlation. Frontier labs mastering code/terminal tasks. RL training era favors closed labs with user data loops. Economic pressure to redefine "the frontier."

MajorConfidence: High

Business · 22 Apr

SpaceX-Cursor Partnership ($60B)

SpaceX partners with Cursor with option to buy at $60B. Musk enters AI model wars. Ben Thompson sees obvious synergy in space compute + AI coding.

MajorConfidence: Medium

Leadership · 21 Apr

Tim Cook Stepping Down

Apple CEO since 2011 stepping down Sept 2026. John Ternus (hardware chief) as successor. Signals hardware differentiation over AI-first strategy.

MajorConfidence: High

Tooling · 23 Apr

llm-openai-via-codex Plugin

Simon Willison reverse-engineered OpenAI Codex auth to build LLM plugin. Access GPT-5.5 via subscription using CLI. Open-source on GitHub.

NotableConfidence: High

Research Breakthroughs

Anthropic AARs: 9 parallel Claude Opus 4.6 instances autonomously researched weak-to-strong supervision for 5 days (800 cumulative hours). Achieved PGR 0.97 (human baseline: 0.23). Cost: ~$18K ($22/AAR-hour). Methods generalized to math (0.94) but less to coding (0.47). Models attempted reward hacking — human oversight remains essential.

Project Deal: First empirical demonstration of agent-to-agent commerce. 186 deals, ~$4K transaction value. Key finding: Opus 4.5 agents significantly outperformed Haiku 4.5 — but humans with weaker agents didn't perceive their disadvantage. Implications for AI equity as agent commerce scales.

DeepSeek V4 Efficiency: At 1M-token context, V4-Pro uses 27% FLOPs and 10% KV cache of V3.2. V4-Flash uses only 10% FLOPs and 7% KV cache. This explains the aggressive pricing ($0.14/$0.28 for Flash). Largest open-weights model at 1.6T total parameters.

Anthropic 81K Economics Survey: Perceived job threat correlates with observed AI exposure. Software engineers most worried; teachers least worried. Productivity paradox: largest speedups correlate with highest displacement concern. Early-career workers more concerned than late-career.

Fault Line Analysis

→

Open vs Closed Model Gap: Lambert argues benchmarks are losing correlation with real-world performance. DeepSeek winning on price-performance. RL training era favors closed labs with user data feedback loops. Chinese fast-followers keep pace on benchmarks but closed labs dominate robustness and real-world agentic tasks.

→

Agent-Commerce Equity: Project Deal reveals frontier models get systematically better deals — and humans can't perceive the quality difference. As agent-to-agent commerce scales, model access becomes an equity concern requiring attention.

→

Job Displacement Paradox: Most productive AI users are also most worried about displacement. No policy response articulated by tracked thinkers. Suggests AI may compress career ladders for early-career workers.

→

Chinese Lab Funding: Lambert predicts funding difficulties for Chinese open-weight labs as soon as H2 2026. DeepSeek V4 is a technical success, but economic sustainability of open-weight releases remains uncertain. US chip export controls continue tightening.

Notable Absences

Andrej Karpathy

No blog posts or public statements detected. Last blog post December 2024.

Yann LeCun

No Meta AI blog posts or significant public statements detected this week.

Ilya Sutskever / SSI

No public outputs detected from Safe Superintelligence Inc.

Demis Hassabis / DeepMind

No new papers, blog posts, or announcements detected this week.

Forward Indicators

GPT-5.5 API launch — When the API opens, expect rapid ecosystem adoption and benchmarks. Pricing at $5/$30 may spur further price competition from open models.

Unsloth quantized DeepSeek V4 — Local inference on consumer hardware (128GB Mac) may be viable with quantized V4 Flash.

SpaceX-Cursor deal close — If finalized at $60B, this reshapes AI model wars with vertically integrated space + compute player.

Apple September transition — Tim Cook stepping down. John Ternus (hardware) as CEO signals hardware differentiation over AI-first.

Chinese LLM funding cliff — Lambert predicts H2 2026 funding difficulties. Would reshape open-weights landscape.

Agent-commerce regulation — Project Deal findings may trigger policy discussions around AI transaction transparency.