Hierarchical Models & Partial Pooling
Hierarchical Models & Partial Pooling
Hierarchical (multilevel) models pool information across related groups by placing a prior on group-level parameters that is itself governed by hyperparameters. This partial pooling shrinks group estimates toward the population mean — an automatic, data-driven regularization that reduces overfitting without discarding group-specific signal. The degree of pooling is learned from the data rather than specified by the analyst.
Hierarchical Model Structure
For $J$ groups with observations $y_{ij}$: $$y_{ij}\mid\theta_j\overset{ind}{\sim} p(y\mid\theta_j)\quad\text{(likelihood)}$$ $$\theta_j\mid\phi\overset{iid}{\sim} p(\theta\mid\phi)\quad\text{(group-level prior)}$$ $$\phi\sim p(\phi)\quad\text{(hyperprior)}$$ The joint posterior is $p(\theta_1,\ldots,\theta_J,\phi\mid\mathbf{y})\propto p(\phi)\prod_j p(\theta_j\mid\phi)\prod_i p(y_{ij}\mid\theta_j)$.
James-Stein Shrinkage & Partial Pooling
For normal means $\theta_j\mid\mu,\tau^2\sim\mathcal{N}(\mu,\tau^2)$ and $\bar y_j\mid\theta_j\sim\mathcal{N}(\theta_j,\sigma^2/n_j)$, the posterior mean exhibits explicit shrinkage: $$E[\theta_j\mid\mathbf{y}]=(1-B_j)\bar y_j+B_j\hat\mu,\quad B_j=\frac{\sigma^2/n_j}{\sigma^2/n_j+\hat\tau^2}$$ Shrinkage factor $B_j\in[0,1]$: small $\hat\tau^2$ (homogeneous groups) $\Rightarrow$ complete pooling; large $\hat\tau^2$ $\Rightarrow$ no pooling.
Example 1: Eight Schools
Gelman et al. (2004) classic: SAT coaching effects in $J=8$ schools, $y_j\sim\mathcal{N}(\theta_j,\sigma_j^2)$ known $\sigma_j$. Hyperprior: $\theta_j\sim\mathcal{N}(\mu,\tau^2)$, $\mu\sim\mathcal{N}(0,100)$, $\tau\sim\text{Half-Cauchy}(0,5)$. Posterior $\hat\tau\approx4$ implies moderate pooling; school-level estimates shrink $30$-$50\%$ toward grand mean, reducing school B's point estimate from $28$ to $≈10$.
Example 2: Reparametrization
Non-centered parametrization avoids funnel geometry in the posterior. Instead of $\theta_j\sim\mathcal{N}(\mu,\tau^2)$, write $\theta_j=\mu+\tau\eta_j$, $\eta_j\sim\mathcal{N}(0,1)$. This decouples $\tau$ from $\eta_j$, eliminating the characteristic funnel that causes HMC divergences when $\tau\approx0$.
Practice
- Derive the complete-pooling ($\tau^2=0$) and no-pooling ($\tau^2\to\infty$) limits of the partial-pooling estimate $E[\theta_j\mid\mathbf{y}]$.
- Explain why centered parametrization creates a posterior funnel and how non-centered parametrization resolves it geometrically.
Show Answer Key
1. The partial-pooling estimate is $E[\theta_j|y]\approx\frac{n_j/\sigma^2}{n_j/\sigma^2+1/\tau^2}\bar{y}_j+\frac{1/\tau^2}{n_j/\sigma^2+1/\tau^2}\mu$. As $\tau^2\to0$: all $\theta_j\to\mu$ (complete pooling). As $\tau^2\to\infty$: $\theta_j\to\bar{y}_j$ (no pooling).
2. Centered: $\theta_j=\mu+\tau\eta_j$ with $\eta_j$ estimated. When $\tau\approx0$, the posterior has a funnel shape — $\tau$ and $\theta_j$ are strongly correlated, causing slow sampling. Non-centered: sample $\eta_j\sim N(0,1)$ and $\tau$ independently, then compute $\theta_j=\mu+\tau\eta_j$. This decouples the parameters, removing the funnel geometry.