Training Sports Statistics Pythagorean Expectation
1 / 5

Pythagorean Expectation

30 min Sports Statistics

Pythagorean Expectation — Winning from Scoring

Bill James, working on baseball statistics in the early 1980s, noticed that teams' winning percentages were very well predicted by an identity that looked like the Pythagorean theorem: $W\% \approx RS^2/(RS^2 + RA^2)$, where $RS$ is runs scored and $RA$ is runs allowed. The formula has since been generalized with an exponent $\gamma$ that varies by sport.

The mathematical motivation is simple. If per-game scoring in each direction is approximately independent with the same distribution shape, the probability one team outscores the other over a season follows a power of the scoring ratio. The exponent $\gamma$ absorbs variance structure: low-scoring sports (soccer) have $\gamma \approx 1.3$, NBA $\gamma \approx 13$–14, NFL $\gamma \approx 2.37$, MLB $\gamma \approx 1.83$ (modern regressions).

Pythagorean expectation has three practical uses: (i) detecting teams whose current record is a regression candidate (lucky or unlucky), (ii) projecting second-half performance from first-half scoring, and (iii) building playoff odds by Monte-Carlo simulation when game-level data are unavailable.

Generalized Pythagorean Expectation

$$\widehat{W\%} = \frac{RS^\gamma}{RS^\gamma + RA^\gamma}$$

Equivalently $\widehat{W\%} = 1/(1 + (RA/RS)^\gamma)$, a logistic function of the log-scoring-ratio with slope $\gamma$. Expected wins $= 162 \cdot \widehat{W\%}$ in MLB.

Pythagorean Residuals and Regression

The residual $\Delta = W\%_{\text{actual}} - \widehat{W\%}$ has no persistent component: teams with large $\Delta$ in the first half of a season regress about 70% of the way to their Pythagorean estimate in the second half. This empirical fact is the basis for many mid-season bets.

Example 1 — MLB winning percentage

The 2019 Yankees scored 943 runs and allowed 739. Compute expected $W\%$ and wins with $\gamma = 1.83$.

Ratio $= (739/943)^{1.83} = 0.784^{1.83} = e^{1.83 \ln 0.784} = e^{-0.445} = 0.641$.

$\widehat{W\%} = 1/(1 + 0.641) = 0.609$. Expected wins $= 162 \cdot 0.609 = 98.7$ (actual: 103 — a modestly lucky team).

Example 2 — NBA using γ = 13.9

A team scores 112.0 per game and allows 108.4. Projected win percentage over 82 games?

$(108.4/112.0)^{13.9} = 0.9679^{13.9} = e^{-0.455} = 0.634$. $\widehat{W\%} = 1/1.634 = 0.612$. Expected wins $= 82(0.612) \approx 50.2$.

Example 3 — Luck correction

A soccer team has 12 wins, 6 draws, 7 losses from 25 matches (1.64 pts/game). It scored 38 goals, allowed 32, giving Pythagorean (γ=1.3) expected points per game. Is the team lucky?

Expected $W\% = 38^{1.3}/(38^{1.3}+32^{1.3}) = 96.4/(96.4+77.8) = 0.554$. Expected points/game $\approx 3\cdot 0.554 + 1\cdot D\% \approx 1.72$ (assuming 25% draws). Actual 1.64; team is slightly unlucky, expect modest upside in the back half.

Interactive Demo: Pythagorean Win-% Calculator
Expected W% =61.2%
Expected wins =50.2
Point diff / game =+4.0
Need for luck =Balanced

Practice Problems

1. Use γ = 2.37: an NFL team scores 27.0 and allows 20.0 per game — expected W% and wins over 17?
2. What does γ represent physically?
3. MLB team scored 810 and allowed 810 — expected W%?
4. Why does low-scoring soccer use γ ≈ 1.3 while NBA uses 13+?
5. A team is 10 games above Pythagorean at the All-Star break. Expected second-half deviation?
6. How would you modify the formula for overtime games?
Show Answer Key

1. $(20/27)^{2.37} = 0.505$; W% $= 1/1.505 = 0.665$, wins $= 11.3$.

2. Variance scale — how decisively scoring margin translates to wins. Higher γ means outcomes track scoring more tightly.

3. Exactly 0.500 — by symmetry, any γ gives the same answer at RS = RA.

4. Low-scoring/high-variance sports produce upsets despite margin; high-scoring/low-variance sports track margin tightly, so the log-odds slope is larger.

5. About 70% regression: expect second-half deviation of roughly +3 games (from +10).

6. Use game-level margin-of-victory models (pyth on totals including OT scoring) or weight OT wins as 0.5 above 0.5.