Methodology & track record
How the model thinks, and how it's doing — scored honestly against what actually happened and against the market's closing line.
A causal model, rated against reality
At the core is a surface-aware Elo: one rating for current strength plus a separate rating for hard, clay and grass, updated after every match and adjusted for margin of victory and time off tour. Ratings are fed by lower-tier (challenger and qualifying) results too, so risers and thin-record players are placed honestly.
On top sits a serve/return decomposition— a serve rating and a return rating per player that drive a point → game → set → match simulator — and a stacked ensemble that blends the Elo anchor with a gradient-boosting model. Tournament title chances come from a Monte-Carlo simulationof the draw that samples each player's rating uncertainty, so the percentages carry honest credible bands (wide for unproven players, tight for established ones).
Every number is earned on held-out, walk-forward data — the model only ever trains on the past and is scored on the future, never a shuffled split — and a final slice of history stays sealed as a one-time exam. The model is file-based and reproducible; it never reads this warehouse.
How sharp, how calibrated
Log-loss rewards being both right and appropriately confident; the market's closing line is the sharpest public benchmark there is. A calibration error near zero means a stated 70% really wins about 70% of the time — shown in the curve below.
By model arm
| arm | log-loss | ECE |
|---|---|---|
| Elo anchor | 0.5934 | 0.0164 |
| Gradient boosting | 0.5935 | 0.0130 |
| Ensemble (production) | 0.5907 | 0.0074 |
Do the percentages mean it?
predicted % (x) vs actual win-rate (y) · dashed = perfect
Track record
| as of | log-loss | vs market | ECE | matches |
|---|---|---|---|---|
| 2026-06-07 | 0.5907 | +0.0157 | 0.0074 | 32,091 |
| 2026-05-25 | 0.5904 | +0.0154 | 0.0076 | 31,966 |
| 2026-05-23 | 0.5904 | +0.0154 | 0.0076 | 31,966 |
“vs market” is the model's log-loss minus the closing line's — negative means sharper than the market on that slice.