Data Structures & Amortized Analysis
Data Structures & Amortized Analysis
Data structures organize data to enable efficient operations. Amortized analysis determines the average cost per operation over a sequence, even when individual operations vary. A good data structure choice can change an algorithm's complexity: binary heaps enable $O(n\log n)$ sorting; union-find with path compression achieves nearly $O(1)$ per operation; balanced BSTs give $O(\log n)$ per update.
Amortized Analysis Methods
Potential method: define $\Phi$ (potential). Amortized cost $\hat{c}_i=c_i+\Delta\Phi_i$. Total actual cost $\leq\sum\hat{c}_i+\Phi_0-\Phi_n$. Aggregate method: total cost over $n$ ops / $n$. Accounting method: charge more for cheap ops, bank credit for future expensive ops. Example: dynamic array doubling: actual costs $1,1,\ldots,1,2,1,\ldots,1,4,\ldots$ — amortized $O(1)$ per push.
Union-Find (Disjoint Sets)
Operations: $\text{Make}(x)$, $\text{Find}(x)$ (returns root), $\text{Union}(x,y)$. With union-by-rank and path compression: amortized $O(\alpha(n))$ per operation where $\alpha$ is the inverse Ackermann function (effectively constant). Path compression: on $\text{Find}$, point all traversed nodes directly to root. Union-by-rank: attach shorter tree under taller. Inverse Ackermann: $\alpha(n)\leq 4$ for $n\leq 2^{2^{2^{2^{65536}}}}$.
Example 1
Prove dynamic array has $O(1)$ amortized push.
Solution: Potential $\Phi=2k-n$ where $k=$ number of elements, $n=$ capacity. Push without resize: actual cost 1, $\Delta\Phi=2$: amortized $=3$. Push with resize (at capacity $n=k$): actual cost $k+1$ (copy $k$ elements), new capacity $2k$; $\Delta\Phi=2(k+1)-(2k)-(2k-k)=-k+2$: amortized $=k+1-k+2=3$. Total: $O(1)$ amortized.
Example 2
Describe a Fibonacci heap and its $O(1)$ amortized decrease-key.
Solution: Fibonacci heap: collection of heap-ordered trees. $\text{Insert}$: $O(1)$. $\text{Find-Min}$: $O(1)$ (pointer to minimum root). $\text{Decrease-Key}$: $O(1)$ amortized — cut node from parent, add to root list; mark parent (cascading cuts amortized). $\text{Extract-Min}$: $O(\log n)$ amortized. Key benefit: Dijkstra with Fibonacci heaps runs in $O(E+V\log V)$ instead of $O(E\log V)$.
Practice
- Prove that a sequence of $n$ push/pop/multipop operations on a stack costs $O(n)$ total (amortized $O(1)$ each).
- Analyze the splay tree: show that $m$ operations on a splay tree cost $O(m\log n)$ total.
- Describe skip lists and show that search takes $O(\log n)$ expected time.
- Prove the Ackermann function grows faster than any primitive recursive function.
Show Answer Key
1. Aggregate analysis: each element is pushed at most once and popped at most once. Over $n$ operations, total pushes $\leq n$, total pops (from pop or multipop) $\leq n$ (can't pop more than pushed). Total cost $\leq 2n = O(n)$. Alternatively, potential method: $\Phi = $ stack size. Push: amortized $= 1+1 = 2$. Pop/multipop of $k$: actual $k$, $\Delta\Phi = -k$, amortized $= k + (-k) = 0$. Total amortized $\leq 2n$.
2. Splay tree: access cost = depth of node + rotations. Potential $\Phi = \sum \log s(x)$ where $s(x) = $ size of subtree at $x$. Amortized cost of splaying node $x$: $\leq 3(\log n - \log s(x)) + 1 = O(\log n)$. Over $m$ operations: total $\leq m \cdot O(\log n) + \Phi_0 - \Phi_m = O(m\log n)$ (since $\Phi \geq 0$). No single operation is guaranteed $O(\log n)$, but the amortized bound holds.
3. Skip list: a linked list with $O(\log n)$ levels of 'express lanes.' Each element is promoted to the next level with probability $1/2$. Search: start at top level, move right until overshoot, drop down. Expected nodes examined per level: $O(1)$ (geometric backward analysis). Expected levels: $O(\log n)$. Total expected search time: $O(\log n)$. Expected space: $O(n)$.
4. Ackermann $A(m,n)$: $A(0,n)=n+1$, $A(m,0)=A(m-1,1)$, $A(m,n)=A(m-1,A(m,n-1))$. $A(1,n)=n+2$, $A(2,n)=2n+3$, $A(3,n)=2^{n+3}-3$, $A(4,n)$ is a tower of 2s of height $n+3$. For any primitive recursive $f$, there exists $m$ such that $f(n) < A(m,n)$ for large $n$. Proof: primitive recursive functions are built from bounded operations; Ackermann's nested recursion on $m$ outgrows any fixed primitive recursive scheme.