Methodology & track record

How the model thinks, and how it's doing — scored honestly against what actually happened and against the market's closing line.

── how it works

A causal model, rated against reality

At the core is a surface-aware Elo: one rating for current strength plus a separate rating for hard, clay and grass, updated after every match and adjusted for margin of victory and time off tour. Ratings are fed by lower-tier (challenger and qualifying) results too, so risers and thin-record players are placed honestly.

On top sits a serve/return decomposition— a serve rating and a return rating per player that drive a point → game → set → match simulator — and a stacked ensemble that blends the Elo anchor with a gradient-boosting model. Tournament title chances come from a Monte-Carlo simulationof the draw that samples each player's rating uncertainty, so the percentages carry honest credible bands (wide for unproven players, tight for established ones).

Every number is earned on held-out, walk-forward data — the model only ever trains on the past and is scored on the future, never a shuffled split — and a final slice of history stays sealed as a one-time exam. The model is file-based and reproducible; it never reads this warehouse.

── held-out accuracy

How sharp, how calibrated

log-loss
0.5907
lower is sharper
calibration error
0.0074
ECE · |pred − actual|
market closing line
0.575
the benchmark
matches scored
32,091
held-out

Log-loss rewards being both right and appropriately confident; the market's closing line is the sharpest public benchmark there is. A calibration error near zero means a stated 70% really wins about 70% of the time — shown in the curve below.

── backtest · 2012-2026

By model arm

armlog-lossECE
Elo anchor0.59340.0164
Gradient boosting0.59350.0130
Ensemble (production)0.59070.0074
── calibration curve

Do the percentages mean it?

00252550507575100100

predicted % (x) vs actual win-rate (y) · dashed = perfect

── the scoreboard over time

Track record

as oflog-lossvs marketECEmatches
2026-06-070.5907+0.01570.007432,091
2026-05-250.5904+0.01540.007631,966
2026-05-230.5904+0.01540.007631,966

“vs market” is the model's log-loss minus the closing line's — negative means sharper than the market on that slice.