Pythagorean Expectation
Pythagorean Expectation — Winning from Scoring
Bill James, working on baseball statistics in the early 1980s, noticed that teams' winning percentages were very well predicted by an identity that looked like the Pythagorean theorem: $W\% \approx RS^2/(RS^2 + RA^2)$, where $RS$ is runs scored and $RA$ is runs allowed. The formula has since been generalized with an exponent $\gamma$ that varies by sport.
The mathematical motivation is simple. If per-game scoring in each direction is approximately independent with the same distribution shape, the probability one team outscores the other over a season follows a power of the scoring ratio. The exponent $\gamma$ absorbs variance structure: low-scoring sports (soccer) have $\gamma \approx 1.3$, NBA $\gamma \approx 13$–14, NFL $\gamma \approx 2.37$, MLB $\gamma \approx 1.83$ (modern regressions).
Pythagorean expectation has three practical uses: (i) detecting teams whose current record is a regression candidate (lucky or unlucky), (ii) projecting second-half performance from first-half scoring, and (iii) building playoff odds by Monte-Carlo simulation when game-level data are unavailable.
$$\widehat{W\%} = \frac{RS^\gamma}{RS^\gamma + RA^\gamma}$$
Equivalently $\widehat{W\%} = 1/(1 + (RA/RS)^\gamma)$, a logistic function of the log-scoring-ratio with slope $\gamma$. Expected wins $= 162 \cdot \widehat{W\%}$ in MLB.
The residual $\Delta = W\%_{\text{actual}} - \widehat{W\%}$ has no persistent component: teams with large $\Delta$ in the first half of a season regress about 70% of the way to their Pythagorean estimate in the second half. This empirical fact is the basis for many mid-season bets.
The 2019 Yankees scored 943 runs and allowed 739. Compute expected $W\%$ and wins with $\gamma = 1.83$.
Ratio $= (739/943)^{1.83} = 0.784^{1.83} = e^{1.83 \ln 0.784} = e^{-0.445} = 0.641$.
$\widehat{W\%} = 1/(1 + 0.641) = 0.609$. Expected wins $= 162 \cdot 0.609 = 98.7$ (actual: 103 — a modestly lucky team).
A team scores 112.0 per game and allows 108.4. Projected win percentage over 82 games?
$(108.4/112.0)^{13.9} = 0.9679^{13.9} = e^{-0.455} = 0.634$. $\widehat{W\%} = 1/1.634 = 0.612$. Expected wins $= 82(0.612) \approx 50.2$.
A soccer team has 12 wins, 6 draws, 7 losses from 25 matches (1.64 pts/game). It scored 38 goals, allowed 32, giving Pythagorean (γ=1.3) expected points per game. Is the team lucky?
Expected $W\% = 38^{1.3}/(38^{1.3}+32^{1.3}) = 96.4/(96.4+77.8) = 0.554$. Expected points/game $\approx 3\cdot 0.554 + 1\cdot D\% \approx 1.72$ (assuming 25% draws). Actual 1.64; team is slightly unlucky, expect modest upside in the back half.
Practice Problems
Show Answer Key
1. $(20/27)^{2.37} = 0.505$; W% $= 1/1.505 = 0.665$, wins $= 11.3$.
2. Variance scale — how decisively scoring margin translates to wins. Higher γ means outcomes track scoring more tightly.
3. Exactly 0.500 — by symmetry, any γ gives the same answer at RS = RA.
4. Low-scoring/high-variance sports produce upsets despite margin; high-scoring/low-variance sports track margin tightly, so the log-odds slope is larger.
5. About 70% regression: expect second-half deviation of roughly +3 games (from +10).
6. Use game-level margin-of-victory models (pyth on totals including OT scoring) or weight OT wins as 0.5 above 0.5.