3 / 5

Regression to the Mean & Hot Hand

30 min Sports Statistics

Regression to the Mean & The Hot Hand

Galton discovered it in heights; every sports analyst rediscovers it monthly. Regression to the mean says that extreme performances in a noisy measurement are followed, on average, by less extreme ones — not because anything changed, but because the extreme contained luck that does not persist.

The key decomposition: observed score $Y = \mu + \tau + \varepsilon$, where $\tau$ is true talent (persistent) and $\varepsilon$ is luck (non-persistent). If $\mathrm{Var}(\tau) = \sigma_\tau^2$ and $\mathrm{Var}(\varepsilon) = \sigma_\varepsilon^2$, the optimal forecast of $Y'$ given $Y$ is $\hat Y' = \mu + r(Y - \mu)$ with $r = \sigma_\tau^2/(\sigma_\tau^2 + \sigma_\varepsilon^2)$ — the fraction of variance that is talent. This is the shrinkage or James–Stein rule.

The classic hot-hand fallacy, long claimed a myth after Gilovich/Tversky/Vallone (1985), was partially rehabilitated by Miller & Sanjurjo (2018) who showed the original test had a subtle sampling bias. Modern consensus: small hot-hand effects exist but are much weaker than players, fans, and coaches perceive — another regression-to-the-mean cautionary tale.

Reliability Coefficient

$$r = \frac{\sigma_\tau^2}{\sigma_\tau^2 + \sigma_\varepsilon^2} = \frac{\text{True variance}}{\text{Total variance}}$$

Optimal forecast: $\hat Y' - \mu = r\,(Y - \mu)$. Low reliability $\Rightarrow$ heavy shrinkage toward the mean.

Miller–Sanjurjo Correction

In a finite sequence of $n$ coin flips, $P(\text{H} \mid \text{previous flip was H}) < 1/2$ in expectation even for a fair coin, because conditioning on H removes a H from the remaining sample. The bias is of order $1/n$ and increases with streak length $k$.

Example 1 — NBA FG% regression

League mean FG% is 46%. Season standard deviation across players is $\sigma_Y = 4\%$. Known noise from game-to-game sampling is $\sigma_\varepsilon = 3\%$. Forecast a player who shot 55% in a small sample.

$\sigma_\tau^2 = 16 - 9 = 7$
so $r = 7/16 = 0.44$.
Forecast: $46 + 0.44(55 - 46) = 46 + 3.94 = 49.9\%$.
Roughly half the overperformance regresses away.

Example 2 — Hot-hand 'test'

In 100 shots with constant true 50% make-rate, you sample 20 shots following a make and count makes. What expected fraction does the naive Gilovich test return?

By Miller–Sanjurjo, expected fraction $\approx 0.5 - 1/(2 \cdot 20) = 0.475$.
The naive test would report a 'cold hand' even though shots are i.i.d.

Example 3 — Sophomore slump

Rookie of the Year hits .315 in 500 at-bats. League average .265. If known reliability for batting average at 500 AB is $r = 0.30$, predict year 2.

$\hat Y' = 0.265 + 0.30(0.315 - 0.265) = 0.265 + 0.015 = 0.280$.
Expect a 35-point drop — not a slump, just regression.

Interactive Demo: Regression to the Mean — Talent vs Luck

Population mean μ: 46

Observed score Y: 55

σ_talent: 2.65

σ_luck: 3.0

Reliability r =0.438

Forecast =49.9

Shrinkage =5.1

Implied talent =51.0

Practice Problems

1. Is a player who went 0-for-20 from three a 'cold shooter'?

2. Compute shrinkage for $\sigma_\tau=2$, $\sigma_\varepsilon=4$, observed deviation $+10$.

3. Why do rookies typically have larger year-over-year shifts?

4. Give the Miller–Sanjurjo approximate bias for streak length k in n flips.

5. A pitcher has ERA 2.10 in 30 innings. Why is ERA+ more stable over a season?

6. What is the 'Sports Illustrated curse' statistically?

Show Answer Key

1. Probably not — with 20 attempts, the noise is enormous. Forecast using small-sample shrinkage toward his career mean.

2. $r = 4/(4+16) = 0.2$; forecast deviation $= 0.2 \cdot 10 = +2$; shrinkage $= 8$.

3. Fewer at-bats → lower reliability → more shrinkage; also selection bias (extreme rookies are a tail sample).

4. Bias of order $-k/(2n)$ — conditioning on a streak of k H removes k hits from the pool.

5. Full-season ERA has smaller luck variance than 30-inning samples; larger sample increases reliability.

6. Cover athletes are picked after extreme performance; on average they regress — the 'curse' is the regression, not mystical punishment.

Elo Ratings & Bradley–Terry Expected Goals (xG) & Shot Quality

Regression to the Mean & Hot Hand

Regression to the Mean & The Hot Hand

Practice Problems

Graphing Calculator

Statistics Calculator

Add Custom Constant

My Notes

Highlights