Methodology & track record

How the model thinks, and how it's doing — scored honestly against what actually happened and against the market's closing line.

── how it works

A causal model, rated against reality

At the core is a surface-aware Elo: one rating for current strength plus a separate rating for hard, clay and grass, updated after every match and adjusted for margin of victory and time off tour. Ratings are fed by lower-tier (challenger and qualifying) results too, so risers and thin-record players are placed honestly.

On top sits a serve/return decomposition— a serve rating and a return rating per player that drive a point → game → set → match simulator — and a stacked ensemble that blends the Elo anchor with a gradient-boosting model. Tournament title chances come from a Monte-Carlo simulationof the draw that samples each player's rating uncertainty, so the percentages carry honest credible bands (wide for unproven players, tight for established ones).

Every number is earned on held-out, walk-forward data — the model only ever trains on the past and is scored on the future, never a shuffled split — and a final slice of history stays sealed as a one-time exam. The model is file-based and reproducible; it never reads this warehouse.

── held-out accuracy

How sharp, how calibrated

log-loss

0.5907

lower is sharper

calibration error

0.0074

ECE · |pred − actual|

market closing line

0.575

the benchmark

matches scored

32,091

held-out

Log-loss rewards being both right and appropriately confident; the market's closing line is the sharpest public benchmark there is. A calibration error near zero means a stated 70% really wins about 70% of the time — shown in the curve below.

── backtest · 2012-2026

By model arm

arm	log-loss	ECE
Elo anchor	0.5934	0.0164
Gradient boosting	0.5935	0.0130
Ensemble (production)	0.5907	0.0074

── calibration curve

Do the percentages mean it?

predicted % (x) vs actual win-rate (y) · dashed = perfect

── the scoreboard over time

Track record

as of	log-loss	vs market	ECE	matches
2026-06-07	0.5907	+0.0157	0.0074	32,091
2026-05-25	0.5904	+0.0154	0.0076	31,966
2026-05-23	0.5904	+0.0154	0.0076	31,966

“vs market” is the model's log-loss minus the closing line's — negative means sharper than the market on that slice.