AI Capabilities (ECI)
Epoch Capabilities Index — composite score across 40+ benchmarks
METR Task Horizon
How long can AI agents reliably work on a task? (hours)
AA Intelligence — Cost-Adjusted
AAII composite index − 15·log₁₀(GPQA cost). Frontier = capability per dollar.
AA Intelligence — Compute-Adjusted
AAII composite index − 15·log₁₀(GPQA compute TFLOPS). Honest at edge.
Hardware Cost Trend
TFLOPS × Memory GB per $1K at launch. Bubble size = VRAM.
Model Release Timeline
Frontier and open-weights releases
Recent Events
All Model Releases
Epoch Capabilities Index (ECI)
Composite benchmark score across 40+ evaluations, by model accessibility. Source: Epoch AI
METR Task Horizon
50th-percentile time horizon for autonomous task completion. Source: METR
AA Intelligence Index — Cost-Adjusted
Composite AAII index − α·log₁₀(cost to run GPQA in USD), α≈14.5 refit on the Pareto frontier. Cost axis uses each model's near-launch input/output prices via Gundlach et al., so release-date ordering carries real temporal signal (unlike AAII's own XLSX cost column, which is re-baked at current prices). Capability axis is AAII's current-suite index — read the chart as "for this level of capability-as-measured-today, what did access cost at release?"
GPQA Diamond — Cost-Adjusted (Over Time)
GPQA Diamond − 15·log₁₀(cost to run GPQA). Cost uses input/output prices observed at or near each model's first sample on AAII — not current prices — so release-date ordering reflects real capability-per-dollar progress. Source: Gundlach et al., "The Price of Progress" through Oct 2025, plus our own GPQA runs via bin/eval_gpqa for later frontier models.
AA Intelligence Index — Compute-Adjusted
AAII composite index − 15·log₁₀(GPQA compute in TFLOPS), where compute ≈ 2·active_params·tokens for the full 198-question run. Unlike the cost-adjusted chart, this axis is provider-independent — it measures capability against the actual compute cost of running the model at the edge, cutting through subsidised API pricing. Open-weights only for reliable active-param data; most closed MoE models are excluded (sizing is rumor). α=15 is pinned (n too small to fit). Source: Gundlach et al.
Compute per Dollar (TFLOPS/$1K)
Inference compute per $1K at launch. Bubble size = VRAM.
Bandwidth per Dollar (GB/s/$1K)
Memory bandwidth per $1K at launch. Bubble size = VRAM.
Composite: TFLOPS × Memory / $1K
Combined compute × memory capacity per $1K. Captures both throughput and working set size. (scales by MLPerf/SemiAnalysis utilization; chips without a measured factor are hidden in Realized)
Hardware Releases
Events
Agent releases, geopolitical events, regulatory actions, and ecosystem moves.
ECI Capability Frontier
Best model score at each date (any accessibility) with 5-year projection. Open-weights shown as green overlay. Bands: 50/80/95% prediction intervals.
METR Task Horizon
Frontier autonomous task completion horizon with 5-year projection (log scale). Open-weights overlay in green.
TFLOPS × Memory / $1K
Composite compute × memory per $1K trend with 5-year projection (log scale).
Trend Summary
Indicators
Structured forecasting questions derived from trend projections. Updated daily.
Calibration
Backtest results: how well do prediction intervals match actual outcomes when tested on held-out data?
Market Context
Prediction market signals relevant to AI capability trends. Loaded from the Markets tab data.
Track Record
Scored predictions from trend-based forecasting. Includes retrodicted predictions (fit on past data, scored against actuals) and live predictions. Brier score: lower is better (0 = perfect, 0.25 = uninformed).
Market Forecasting
LLM price forecasts with trading simulation. Amber dots show implied E[price] at each forecast cutoff. Green/red shading shows simulated positions.
Prediction Market Signals
Biggest probability movers from Polymarket and Manifold. Sparklines show the last 90 calendar days.
All Tracked Markets
AI-related prediction markets with historical price data. Click column headers to sort.
Activity Log
Collector runs, curator updates, and data changes
System Status
Infrastructure health, external API state, last collector runs.