Prior Distributions: Conjugate & Non-informative
Prior Distributions: Conjugate & Non-informative
The prior $p(\theta)$ encodes all knowledge about the parameter before data are observed. Prior specification is simultaneously the greatest strength and most criticized aspect of Bayesian analysis. Choosing a prior well requires understanding conjugacy, invariance principles, and the practical implications for posterior inference.
Conjugate priors yield posteriors in the same family as the prior, enabling closed-form inference. Non-informative priors attempt to minimize the influence of prior knowledge, often via invariance arguments.
Conjugate Prior
A prior family $\mathcal{F}$ is conjugate to likelihood $p(x\mid\theta)$ if the posterior $p(\theta\mid x)\in\mathcal{F}$ for all $x$. Key conjugate pairs:
| Likelihood | Prior | Posterior |
| Binomial$(n,\theta)$ | Beta$(\alpha,\beta)$ | Beta$(\alpha+k,\beta+n-k)$ |
| Poisson$(\lambda)$ | Gamma$(a,b)$ | Gamma$(a+\sum x_i,b+n)$ |
| Normal$(\mu,\sigma^2)$, $\sigma^2$ known | Normal$(\mu_0,\tau^2)$ | Normal$(\mu_n,\tau_n^2)$ |
Jeffreys Prior
$p_J(\theta)\propto\sqrt{\det\mathcal{I}(\theta)}$ is invariant under reparametrization: if $\phi=g(\theta)$, then $p_J(\phi)\propto\sqrt{\det\mathcal{I}(\phi)}$. For Bernoulli, $\mathcal{I}(\theta)=1/(\theta(1-\theta))$, giving $p_J(\theta)\propto\theta^{-1/2}(1-\theta)^{-1/2}=$Beta$(1/2,1/2)$.
Example 1: Beta-Binomial Conjugacy
Let $\theta\sim$Beta$(2,2)$ (prior mean $0.5$) and observe $k=7$ successes in $n=10$ trials. Posterior: $\theta\mid k\sim$Beta$(2+7,\,2+3)=$Beta$(9,5)$. Posterior mean $=9/14\approx0.643$, shrunk toward prior. Effective sample size of prior $=$Beta$(2,2)$ is $\alpha+\beta=4$ pseudo-observations.
Example 2: Normal-Normal Conjugacy
Prior $\mu\sim\mathcal{N}(\mu_0,\tau^2)$, likelihood $x_i\overset{iid}{\sim}\mathcal{N}(\mu,\sigma^2)$. Posterior precision $=\tau^{-2}+n\sigma^{-2}$. Posterior mean is a precision-weighted average: $$\mu_n=\frac{\tau^{-2}\mu_0+n\sigma^{-2}\bar{x}}{\tau^{-2}+n\sigma^{-2}}$$ As $n\to\infty$, $\mu_n\to\bar{x}$; as $\tau^2\to\infty$ (diffuse prior), $\mu_n\to\bar{x}$ for any $n$.
Practice
- Derive the Jeffreys prior for the Poisson rate $\lambda$. Show it equals Gamma$(1/2,0)$ (improper).
- A Beta$(1,1)$ prior is uniform on $[0,1]$. After observing $k=3$ heads in $n=5$ flips, state the posterior and compute the posterior mean and variance.
Show Answer Key
1. Fisher information $I(\lambda)=E[(\partial\log P(x|\lambda)/\partial\lambda)^2]=1/\lambda$. Jeffreys prior $\pi(\lambda)\propto\sqrt{I(\lambda)}=\lambda^{-1/2}$, which is $\text{Gamma}(1/2,0)$ (improper).
2. Posterior: $\text{Beta}(1+3,1+2)=\text{Beta}(4,3)$. Mean $=4/7\approx0.571$. Variance $=4\cdot3/(7^2\cdot8)=12/392=3/98\approx0.0306$.