Training Bayesian Inference & Statistical Modeling Bayesian Model Comparison & Decision Theory
7 / 7

Bayesian Model Comparison & Decision Theory

35 min Bayesian Inference & Statistical Modeling

Bayesian Model Comparison & Decision Theory

Bayesian model comparison uses the marginal likelihood (evidence) to weight competing models, automatically penalizing complexity via Occam's razor. Bayesian decision theory provides a coherent framework for choosing actions that minimize expected posterior loss, unifying estimation, hypothesis testing, and prediction into a single framework grounded in probability calculus.

Bayes Factor & Model Evidence

For models $M_1,M_2$ with priors $P(M_1),P(M_2)$: $$\text{BF}_{12}=\frac{p(\mathcal{D}\mid M_1)}{p(\mathcal{D}\mid M_2)}=\frac{\int p(\mathcal{D}\mid\theta_1,M_1)p(\theta_1\mid M_1)d\theta_1}{\int p(\mathcal{D}\mid\theta_2,M_2)p(\theta_2\mid M_2)d\theta_2}$$ Posterior odds $=$ BF $\times$ prior odds. Jeffreys' scale: $\log_{10}\text{BF}>2$ is decisive evidence for $M_1$. Unlike $p$-values, BFs can provide evidence for the null.

Bayesian Decision Theory

Given posterior $p(\theta\mid\mathcal{D})$ and loss function $L(\theta,a)$ for action $a\in\mathcal{A}$, the Bayes-optimal action minimizes posterior expected loss: $$a^*=\arg\min_{a\in\mathcal{A}}\mathbb{E}_{p(\theta\mid\mathcal{D})}[L(\theta,a)]$$ Under squared-error loss $L(\theta,a)=(\theta-a)^2$, $a^*=E[\theta\mid\mathcal{D}]$ (posterior mean). Under absolute-error loss, $a^*=\text{median}(p(\theta\mid\mathcal{D}))$. Under 0-1 loss, $a^*=\text{mode}(p(\theta\mid\mathcal{D}))$ (MAP).

Example 1: WAIC & LOO-CV

When model evidence is intractable, use information criteria. WAIC: $\widehat{\text{elpd}}_{\text{WAIC}}=\sum_i\log\bar p(y_i)-\sum_i\text{Var}_{\text{post}}(\log p(y_i\mid\theta))$. PSIS-LOO-CV uses Pareto-smoothed importance sampling weights to estimate leave-one-out predictive density, flagging influential observations via Pareto-$\hat k>0.7$.

Example 2: Posterior Predictive Checks

Generate replicated data $y^{rep}\sim p(y^{rep}\mid\mathcal{D})=\int p(y^{rep}\mid\theta)p(\theta\mid\mathcal{D})d\theta$. Compare test statistic $T(y^{rep})$ to observed $T(y)$ via Bayesian $p$-value: $p_B=P(T(y^{rep})\geq T(y)\mid\mathcal{D})$. Values near 0 or 1 indicate model misspecification. This is a self-consistency check, not a classical hypothesis test.

Practice

  1. Explain why the Bayes factor automatically penalizes model complexity (Occam's razor effect) even without an explicit complexity penalty term.
  2. Under asymmetric linear loss $L(\theta,a)=c_1( heta-a)^+\ +\ c_2(a-\theta)^+$, derive the Bayes-optimal action in terms of a posterior quantile.
Show Answer Key

1. The Bayes factor $B_{12}=\int P(D|\theta,M_1)\pi(\theta|M_1)d\theta\,/\,\int P(D|\theta,M_2)\pi(\theta|M_2)d\theta$. Complex models spread their prior over more parameter space, so the marginal likelihood (numerator/denominator) is diluted. Simpler models concentrate prior mass near the data, achieving higher marginal likelihood when data is simple — an automatic Occam's razor.

2. The Bayes-optimal action minimizes expected loss: $a^*=\arg\min_a E[L(\theta,a)|D]$. For asymmetric linear loss, $a^*$ is the $c_2/(c_1+c_2)$-quantile of the posterior distribution of $\theta$.