Training Information Theory Differential Entropy & Gaussian Channels
6 / 7

Differential Entropy & Gaussian Channels

35 min Information Theory

Differential Entropy & Gaussian Channels

For continuous random variables, differential entropy $h(X)=-\int p(x)\log p(x)\,dx$ extends Shannon entropy. Unlike discrete entropy, differential entropy can be negative and is not invariant under bijections. However, mutual information $I(X;Y)=h(X)-h(X|Y)$ remains well-defined and channel capacity theory extends seamlessly. The AWGN channel achieves capacity with Gaussian inputs — a consequence of the maximum entropy property of Gaussians.

Differential Entropy

For a continuous random variable $X$ with density $p(x)$: $$h(X)=-\int_{\mathcal{X}}p(x)\log p(x)\,dx=\mathbb{E}[-\log p(X)].$$ Unlike discrete entropy: $h$ can be negative (e.g., $X\sim\text{Uniform}(0,\epsilon)$, $h=-\log(1/\epsilon)\to-\infty$); $h(aX)=h(X)+\log|a|$ (not invariant); $h(f(X))\neq h(X)+\log|f'|$ in general. Maximum entropy: among all distributions with variance $\sigma^2$, the Gaussian $\mathcal{N}(0,\sigma^2)$ maximizes $h(X)=\frac{1}{2}\log(2\pi e\sigma^2)$.

Maximum Entropy Theorem (Gaussian)

Among all continuous distributions on $\mathbb{R}$ with fixed variance $\sigma^2$, the Gaussian distribution uniquely maximizes differential entropy: $$h(X)\leq\frac{1}{2}\log(2\pi e\sigma^2)$$ with equality iff $X\sim\mathcal{N}(0,\sigma^2)$. Proof: $0\leq D_{KL}(p\|\phi)=\int p\log(p/\phi)=-h(p)-\int p\log\phi\,dx=-h(p)+\frac{1}{2}\log(2\pi\sigma^2)+\frac{\mathbb{E}[X^2]}{2\sigma^2}=-h(p)+\frac{1}{2}\log(2\pi e\sigma^2)$. This gives $h(p)\leq\frac{1}{2}\log(2\pi e\sigma^2)=h(\phi)$.

Example 1

Compute the differential entropy of $X\sim\mathcal{N}(\mu,\sigma^2)$ and $Y\sim\text{Exp}(\lambda)$.

Solution: For Gaussian: $h(X)=\mathbb{E}[-\log p(X)]=-\mathbb{E}\left[-\frac{1}{2}\log(2\pi\sigma^2)-\frac{(X-\mu)^2}{2\sigma^2}\right]=\frac{1}{2}\log(2\pi\sigma^2)+\frac{\mathbb{E}[(X-\mu)^2]}{2\sigma^2}=\frac{1}{2}\log(2\pi\sigma^2)+\frac{1}{2}=\frac{1}{2}\log(2\pi e\sigma^2)$ nats. For Exponential: $h(Y)=-\int_0^\infty\lambda e^{-\lambda y}(\log\lambda-\lambda y)\,dy=-\log\lambda+\lambda\cdot\frac{1}{\lambda}=1-\log\lambda$ nats.

Example 2

Derive the capacity of the AWGN channel $Y=X+Z$, $Z\sim\mathcal{N}(0,N)$, $\mathbb{E}[X^2]\leq P$.

Solution: $I(X;Y)=h(Y)-h(Y|X)=h(Y)-h(Z)$. Since $Z$ is Gaussian, $h(Z)=\frac{1}{2}\log(2\pi eN)$. By the maximum entropy theorem, $h(Y)$ is maximized by making $Y$ Gaussian with variance $P+N$: choose $X\sim\mathcal{N}(0,P)$. Then $h(Y)=\frac{1}{2}\log(2\pi e(P+N))$. So $C=\frac{1}{2}\log\frac{P+N}{N}=\frac{1}{2}\log\left(1+\frac{P}{N}\right)$ bits. This is Shannon's capacity formula; doubling power adds $1/2$ bit, while doubling bandwidth doubles capacity.

Practice

  1. Compute $I(X;Y)$ for $X\sim\mathcal{N}(0,1)$ and $Y=X+Z$, $Z\sim\mathcal{N}(0,1)$ independent. What is the channel capacity at SNR=1?
  2. Show that for the vector AWGN channel $Y=Hx+Z$ with noise covariance $K_Z$, capacity with power constraint $\text{tr}(K_X)\leq P$ is achieved by water-filling over eigenmodes of $HK_ZH^\top$.
  3. Prove $h(X+Y)\geq\max(h(X),h(Y))$ when $X\perp Y$. When does equality hold?
  4. For an entropy-power inequality $2^{2h(X+Y)}\geq 2^{2h(X)}+2^{2h(Y)}$, prove it using the Fisher information version (de Bruijn identity).
Show Answer Key

1. $I(X;Y)=h(Y)-h(Y|X)=h(Y)-h(Z)$. $Y\sim\mathcal{N}(0,2)$, so $h(Y)=\frac{1}{2}\log(2\pi e\cdot2)$. $h(Z)=\frac{1}{2}\log(2\pi e)$. $I=\frac{1}{2}\log 2\approx0.5$ bits. Channel capacity at SNR=1: $C=\frac{1}{2}\log_2(1+1)=0.5$ bits/use.

2. Whiten noise: $K_Z^{-1/2}Y=K_Z^{-1/2}Hx+\tilde{Z}$ with $\tilde{Z}\sim\mathcal{N}(0,I)$. SVD of $K_Z^{-1/2}H=U\Sigma V^\dagger$. Capacity $=\sum_{i}\frac{1}{2}\log(1+P_i\sigma_i^2)$ with water-filling $P_i=(\mu-1/\sigma_i^2)^+$, $\sum P_i=P$.

3. $h(X+Y)\ge h(X)$: adding independent $Y$ can only increase entropy (entropy power inequality direction). Equality when $Y$ is constant (degenerate). More precisely, equality in $h(X+Y)\ge h(X)$ iff $Y=0$ a.s.

4. De Bruijn identity: $\frac{d}{dt}h(X+\sqrt{t}Z)=\frac{1}{2}J(X+\sqrt{t}Z)$ where $J$ is Fisher information. Fisher information inequality: $1/J(X+Y)\ge1/J(X)+1/J(Y)$. Exponentiating the integral of de Bruijn gives the entropy power inequality $N(X+Y)\ge N(X)+N(Y)$ where $N(X)=\frac{1}{2\pi e}e^{2h(X)}$.