Sorting Algorithms & Lower Bounds
Sorting Algorithms & Lower Bounds
Sorting is the canonical algorithmic problem. Comparison-based sorts have an $\Omega(n\log n)$ lower bound from decision tree argument; mergesort and heapsort achieve this. Non-comparison sorts (counting, radix) can do $O(n)$ for bounded integers. The lower bound proof is one of the most elegant in computer science.
Sorting Algorithms
Mergesort: divide and conquer, $O(n\log n)$ time, $O(n)$ space, stable. Heapsort: build max-heap $O(n)$, extract max $n$ times $O(n\log n)$; in-place but not stable. Quicksort: pivot partitioning, average $O(n\log n)$, worst $O(n^2)$ (with random pivot: expected $O(n\log n)$). Counting sort: for integers in $[0,k]$, $O(n+k)$; Radix sort: $d$ passes of counting sort: $O(d(n+k))$.
Comparison Sort Lower Bound
Any comparison-based sort requires $\Omega(n\log n)$ comparisons in the worst case. Proof: algorithm's behavior modeled as a binary decision tree; each leaf is a permutation; $n!$ leaves; height $h\geq\log_2(n!)\geq n\log_2 n-n\log_2 e=\Omega(n\log n)$ (Stirling). Hence no comparison sort beats $O(n\log n)$.
Example 1
Show heapsort runs in $O(n\log n)$.
Solution: Build-heap: each call to max-heapify costs $O(\log n)$; total $n$ calls but tighter analysis: $\sum_{h=0}^{\lfloor\log n\rfloor}\lceil n/2^{h+1}\rceil O(h)=O(n)$. Extract-max: $n$ extractions, each $O(\log n)$: total $O(n\log n)$. Total: $O(n+n\log n)=O(n\log n)$.
Example 2
Why does quicksort have $\Theta(n^2)$ worst case? What's the randomized expected complexity?
Solution: Worst case: always picking the min/max pivot causes $T(n)=T(n-1)+O(n)=O(n^2)$. Random pivot: expected $T(n)=E[T(R_k-1)+T(n-R_k)+O(n)]$ where $R_k\sim\text{Uniform}(1,n)$. Solving: $E[T(n)]=O(n\log n)$ by solving the recurrence with harmonic numbers.
Practice
- Prove that any stable sort can be used to implement a stable multi-key sort (generalized counting sort argument).
- Describe the introsort algorithm and explain why it is used in practice (std::sort in C++).
- Show that radix sort requires $d=\lceil\log_k n\rceil$ passes for $n$ $k$-bit numbers.
- Prove the lower bound $\Omega(n\log n)$ still holds even with randomization (information-theoretic argument).
Show Answer Key
1. Given a stable sort on each key: sort by the least significant key first, then progressively by more significant keys. Stability ensures that within each group sorted by a higher-priority key, the relative order from lower-priority sorts is preserved. Counting sort is stable and runs in $O(n+k)$, so multi-key sort with $d$ keys costs $O(d(n+k))$.
2. Introsort begins as quicksort. If the recursion depth exceeds $2\lfloor\log_2 n\rfloor$, it switches to heapsort (guaranteed $O(n\log n)$ worst case). For small partitions ($\leq 16$), it uses insertion sort (cache-efficient). This combines quicksort's average-case speed, heapsort's worst-case guarantee, and insertion sort's small-$n$ efficiency. Used in C++ STL `std::sort`.
3. Each number has $\lceil\log_2 n\rceil$ bits (to distinguish $n$ numbers). Using base $k$, the number of digits is $d = \lceil\log_k n\rceil = \lceil\log_2 n / \log_2 k\rceil$. Each radix-sort pass (stable counting sort on one digit) costs $O(n+k)$. Total: $O(d(n+k)) = O(\frac{\log n}{\log k}(n+k))$.
4. Any comparison-based sort must determine one of $n!$ possible permutations. Each comparison gives 1 bit of information. The decision tree has $\geq n!$ leaves, so height $\geq \log_2(n!) = \Omega(n\log n)$ (Stirling). Randomization only affects which path is taken through the tree, not the tree's height. The expected number of comparisons over any random input is still $\Omega(n\log n)$ (Yao's minimax principle).