Global Situational Awareness Dashboard

AI Capabilities (ECI)

Epoch Capabilities Index — composite score across 40+ benchmarks

METR Task Horizon

How long can AI agents reliably work on a task? (hours)

AA Intelligence — Cost-Adjusted

AAII composite index − 15·log₁₀(GPQA cost). Frontier = capability per dollar.

AA Intelligence — Compute-Adjusted

AAII composite index − 15·log₁₀(GPQA compute TFLOPS). Honest at edge.

Hardware Cost Trend

TFLOPS × Memory GB per $1K at launch. Bubble size = VRAM. Paper Realized η

Model Release Timeline

Frontier and open-weights releases

Recent Events

All Model Releases

Epoch Capabilities Index (ECI)

Composite benchmark score across 40+ evaluations, by model accessibility. Source: Epoch AI

METR Task Horizon

50th-percentile time horizon for autonomous task completion. Source: METR

AA Intelligence Index — Cost-Adjusted

Composite AAII index − α·log₁₀(cost to run GPQA in USD), α≈14.5 refit on the Pareto frontier. Cost axis uses each model's near-launch input/output prices via Gundlach et al., so release-date ordering carries real temporal signal (unlike AAII's own XLSX cost column, which is re-baked at current prices). Capability axis is AAII's current-suite index — read the chart as "for this level of capability-as-measured-today, what did access cost at release?"

GPQA Diamond — Cost-Adjusted (Over Time)

GPQA Diamond − 15·log₁₀(cost to run GPQA). Cost uses input/output prices observed at or near each model's first sample on AAII — not current prices — so release-date ordering reflects real capability-per-dollar progress. Source: Gundlach et al., "The Price of Progress" through Oct 2025, plus our own GPQA runs via bin/eval_gpqa for later frontier models.

AA Intelligence Index — Compute-Adjusted

AAII composite index − 15·log₁₀(GPQA compute in TFLOPS), where compute ≈ 2·active_params·tokens for the full 198-question run. Unlike the cost-adjusted chart, this axis is provider-independent — it measures capability against the actual compute cost of running the model at the edge, cutting through subsidised API pricing. Open-weights only for reliable active-param data; most closed MoE models are excluded (sizing is rumor). α=15 is pinned (n too small to fit). Source: Gundlach et al.

Compute per Dollar (TFLOPS/$1K)

Inference compute per $1K at launch. Bubble size = VRAM.

Bandwidth per Dollar (GB/s/$1K)

Memory bandwidth per $1K at launch. Bubble size = VRAM.

Composite: TFLOPS × Memory / $1K

Combined compute × memory capacity per $1K. Captures both throughput and working set size. Paper Realized η (scales by MLPerf/SemiAnalysis utilization; chips without a measured factor are hidden in Realized)

Hardware Releases

Events

Agent releases, geopolitical events, regulatory actions, and ecosystem moves.

Tuned Pre-tuning

ECI Capability Frontier

Best model score at each date (any accessibility) with 5-year projection. Open-weights shown as green overlay. Bands: 50/80/95% prediction intervals.

METR Task Horizon

Frontier autonomous task completion horizon with 5-year projection (log scale). Open-weights overlay in green.

TFLOPS × Memory / $1K

Composite compute × memory per $1K trend with 5-year projection (log scale).

Trend Summary

Indicators

Structured forecasting questions derived from trend projections. Updated daily.

Calibration

Backtest results: how well do prediction intervals match actual outcomes when tested on held-out data?

Market Context

Prediction market signals relevant to AI capability trends. Loaded from the Markets tab data.

Track Record

Scored predictions from trend-based forecasting. Includes retrodicted predictions (fit on past data, scored against actuals) and live predictions. Brier score: lower is better (0 = perfect, 0.25 = uninformed).

Market Forecasting

LLM price forecasts with trading simulation. Amber dots show implied E[price] at each forecast cutoff. Green/red shading shows simulated positions.

Prediction Market Signals

Biggest probability movers from Polymarket and Manifold. Sparklines show the last 90 calendar days.

All Tracked Markets

AI-related prediction markets with historical price data. Click column headers to sort.

Activity Log

Collector runs, curator updates, and data changes

System Status

Infrastructure health, external API state, last collector runs.