Training Information Theory Channel Capacity & Shannon's Channel Coding Theorem
4 / 7

Channel Capacity & Shannon's Channel Coding Theorem

35 min Information Theory

Channel Capacity & Shannon's Channel Coding Theorem

Shannon's channel coding theorem is the central result of information theory: every noisy channel has a capacity $C$ (bits/use) such that reliable communication is possible at any rate $RC$. Capacity is the maximum mutual information over all input distributions. This remarkable theorem shows that noise does not fundamentally limit communication — it only reduces the rate — and sets the ultimate benchmark for every communication system.

Channel Capacity

A discrete memoryless channel (DMC) is specified by input alphabet $\mathcal{X}$, output alphabet $\mathcal{Y}$, and transition probabilities $P(y|x)$. The capacity is $$C=\max_{P_X} I(X;Y)=\max_{P_X}[H(Y)-H(Y|X)]$$ where $H(Y|X)=\sum_x P_X(x)H(P_{Y|X=x})$ is determined by the channel. For the binary symmetric channel (BSC) with crossover probability $p$: $H(Y|X)=h(p)$ always, so $C=\max H(Y)-h(p)=1-h(p)$ (maximized by uniform input).

Shannon's Channel Coding Theorem

For a DMC with capacity $C$: (Achievability) For any $R0$, there exists a sequence of $(2^{nR},n)$ codes with block error probability $P_e^{(n)}\to 0$. (Converse) For any $R>C$ and any code, $P_e^{(n)}\geq 1-R/C-o(1)\to c>0$ (Fano's inequality). Proof of achievability uses random coding: generate $2^{nR}$ codewords i.i.d. from the capacity-achieving input; decode by joint typicality. The error probability $\to 0$ iff $R

Example 1

Compute the capacity of the binary erasure channel (BEC) with erasure probability $\epsilon$.

Solution: The BEC has input $X\in\{0,1\}$ and output $Y\in\{0,1,?\}$: $P(Y=x|X=x)=1-\epsilon$ and $P(Y=?|X=x)=\epsilon$. $H(Y|X)=H(\epsilon)=h(\epsilon)$ (binary entropy of erasure event)... more carefully: given $X$, $Y$ is either $X$ (prob $1-\epsilon$) or erased (prob $\epsilon$), so $H(Y|X=x)=h(\epsilon)$ for each $x$, giving $H(Y|X)=h(\epsilon)$. For uniform input $P(X=0)=P(X=1)=1/2$: $P(Y=0)=(1-\epsilon)/2$, $P(Y=1)=(1-\epsilon)/2$, $P(Y=?)=\epsilon$. $H(Y)=-2\frac{1-\epsilon}{2}\log\frac{1-\epsilon}{2}-\epsilon\log\epsilon=(1-\epsilon)\log\frac{2}{1-\epsilon}-\epsilon\log\epsilon$. Thus $C=H(Y)-h(\epsilon)=1-\epsilon$ bits. Erasure simply removes a fraction $\epsilon$ of transmitted bits.

Example 2

Show using Fano's inequality that no code can achieve reliable communication at rate $R>C$ on a BSC($p$).

Solution: Fano's inequality: if $X^n$ is the message (uniform on $2^{nR}$ codewords) and $\hat{X}^n$ is the decoded message, $H(X^n|Y^n)\leq 1+P_e^{(n)}nR$. Since $X^n\to Y^n\to\hat{X}^n$ is Markov, $I(X^n;Y^n)\geq H(X^n)-H(X^n|Y^n)\geq nR-1-P_e^{(n)}nR$. But $I(X^n;Y^n)\leq nC=n(1-h(p))$ (DMC). So $nR(1-P_e)\leq nC+1$, giving $P_e\geq 1-C/R-1/(nR)$. For $R>C$, $P_e$ stays bounded away from 0 as $n\to\infty$.

Practice

  1. Compute the capacity of the additive white Gaussian noise (AWGN) channel with power constraint $P$ and noise variance $N_0/2$: $Y=X+Z$, $Z\sim\mathcal{N}(0,N_0/2)$, $\mathbb{E}[X^2]\leq P$. This is Shannon's formula $C=\frac{1}{2}\log_2(1+\text{SNR})$.
  2. Prove the channel coding theorem achievability direction using the method of types.
  3. Show that feedback does not increase capacity of a memoryless channel.
  4. Define MIMO channel capacity and sketch how spatial multiplexing achieves the $\min(n_T,n_R)$-fold multiplexing gain.
Show Answer Key

1. AWGN: $Y=X+Z$, $Z\sim\mathcal{N}(0,N_0/2)$. $I(X;Y)=h(Y)-h(Y|X)=h(Y)-h(Z)$. Maximize $h(Y)$: Gaussian $Y$ maximizes differential entropy for given variance. With $E[X^2]\le P$: $\text{Var}(Y)\le P+N_0/2$. So $C=\frac{1}{2}\log_2(1+2P/N_0)=\frac{1}{2}\log_2(1+\text{SNR})$ bits/use.

2. Generate $2^{nR}$ codewords i.i.d. $\sim\mathcal{N}(0,P)$. Decoder: find the codeword jointly typical with the received sequence. By the packing lemma, if $R

3. For memoryless channels, $C=\max_{p(x)}I(X;Y)$. Feedback allows the encoder to know $Y^{n-1}$, so $p(x_n|x^{n-1},y^{n-1})$ can depend on past outputs. But $I(X^n;Y^n)=\sum I(X_i;Y_i|Y^{i-1})\le\sum I(X_i;Y_i)$ (memoryless), so feedback doesn't increase capacity.

4. MIMO: $C=\log_2\det(I+\frac{P}{n_T}HH^\dagger/N_0)$ bits/use. SVD $H=U\Sigma V^\dagger$ decomposes into $\min(n_T,n_R)$ parallel channels. Water-filling allocates power across eigenmodes. At high SNR, $C\approx\min(n_T,n_R)\log_2(\text{SNR})$, achieving the multiplexing gain.