Training Bayesian Inference & Statistical Modeling Prior Distributions: Conjugate & Non-informative
2 / 7

Prior Distributions: Conjugate & Non-informative

35 min Bayesian Inference & Statistical Modeling

Prior Distributions: Conjugate & Non-informative

The prior $p(\theta)$ encodes all knowledge about the parameter before data are observed. Prior specification is simultaneously the greatest strength and most criticized aspect of Bayesian analysis. Choosing a prior well requires understanding conjugacy, invariance principles, and the practical implications for posterior inference.

Conjugate priors yield posteriors in the same family as the prior, enabling closed-form inference. Non-informative priors attempt to minimize the influence of prior knowledge, often via invariance arguments.

Conjugate Prior

A prior family $\mathcal{F}$ is conjugate to likelihood $p(x\mid\theta)$ if the posterior $p(\theta\mid x)\in\mathcal{F}$ for all $x$. Key conjugate pairs:

LikelihoodPriorPosterior
Binomial$(n,\theta)$Beta$(\alpha,\beta)$Beta$(\alpha+k,\beta+n-k)$
Poisson$(\lambda)$Gamma$(a,b)$Gamma$(a+\sum x_i,b+n)$
Normal$(\mu,\sigma^2)$, $\sigma^2$ knownNormal$(\mu_0,\tau^2)$Normal$(\mu_n,\tau_n^2)$

Jeffreys Prior

$p_J(\theta)\propto\sqrt{\det\mathcal{I}(\theta)}$ is invariant under reparametrization: if $\phi=g(\theta)$, then $p_J(\phi)\propto\sqrt{\det\mathcal{I}(\phi)}$. For Bernoulli, $\mathcal{I}(\theta)=1/(\theta(1-\theta))$, giving $p_J(\theta)\propto\theta^{-1/2}(1-\theta)^{-1/2}=$Beta$(1/2,1/2)$.

Example 1: Beta-Binomial Conjugacy

Let $\theta\sim$Beta$(2,2)$ (prior mean $0.5$) and observe $k=7$ successes in $n=10$ trials. Posterior: $\theta\mid k\sim$Beta$(2+7,\,2+3)=$Beta$(9,5)$. Posterior mean $=9/14\approx0.643$, shrunk toward prior. Effective sample size of prior $=$Beta$(2,2)$ is $\alpha+\beta=4$ pseudo-observations.

Example 2: Normal-Normal Conjugacy

Prior $\mu\sim\mathcal{N}(\mu_0,\tau^2)$, likelihood $x_i\overset{iid}{\sim}\mathcal{N}(\mu,\sigma^2)$. Posterior precision $=\tau^{-2}+n\sigma^{-2}$. Posterior mean is a precision-weighted average: $$\mu_n=\frac{\tau^{-2}\mu_0+n\sigma^{-2}\bar{x}}{\tau^{-2}+n\sigma^{-2}}$$ As $n\to\infty$, $\mu_n\to\bar{x}$; as $\tau^2\to\infty$ (diffuse prior), $\mu_n\to\bar{x}$ for any $n$.

Practice

  1. Derive the Jeffreys prior for the Poisson rate $\lambda$. Show it equals Gamma$(1/2,0)$ (improper).
  2. A Beta$(1,1)$ prior is uniform on $[0,1]$. After observing $k=3$ heads in $n=5$ flips, state the posterior and compute the posterior mean and variance.
Show Answer Key

1. Fisher information $I(\lambda)=E[(\partial\log P(x|\lambda)/\partial\lambda)^2]=1/\lambda$. Jeffreys prior $\pi(\lambda)\propto\sqrt{I(\lambda)}=\lambda^{-1/2}$, which is $\text{Gamma}(1/2,0)$ (improper).

2. Posterior: $\text{Beta}(1+3,1+2)=\text{Beta}(4,3)$. Mean $=4/7\approx0.571$. Variance $=4\cdot3/(7^2\cdot8)=12/392=3/98\approx0.0306$.