Training Data Visualization Histograms & Distributions
2 / 10

Histograms & Distributions

23 min Data Visualization

Histograms & Distributions

Histogram

Divide the range of data into equal-width bins. The height (or area) of each bar represents the frequency or relative frequency of values in that bin.

Key Bin-Width Formulas
  • Sturges' rule: $k = \lceil 1 + \log_2 n \rceil$
  • Square-root rule: $k = \lceil \sqrt{n} \rceil$

$k$ = number of bins, $n$ = number of data points.

Shape Descriptors
  • Symmetric: mean ≈ median, mirror image about center
  • Right-skewed: long tail to the right, mean > median
  • Left-skewed: long tail to the left, mean < median
  • Bimodal: two peaks
Density Curve

A smooth curve where the total area under it equals 1. The area over an interval gives the proportion of data in that range.

Example 1

Data: 12, 15, 17, 19, 21, 22, 25, 28, 30, 35. Use 5 bins of width 5 starting at 10.

Bins: [10,15): 2, [15,20): 2, [20,25): 2, [25,30): 2, [30,35]: 2. Uniform distribution.

Example 2

A histogram has most bars on the left with a long tail to the right. Describe the skew.

Right-skewed (positively skewed). Mean > median.

Example 3

$n = 100$ data points. How many bins by Sturges' rule?

$k = \lceil 1 + \log_2 100 \rceil = \lceil 1 + 6.64 \rceil = 8$.

Practice Problems

1. What axis shows frequency in a histogram?
2. Sturges' bins for $n = 50$?
3. Square-root bins for $n = 64$?
4. Describe: histogram with two peaks.
5. Income data is typically what skew?
6. Relative frequency histogram: what do bar heights sum to?
7. What happens if bins are too wide?
8. What happens if bins are too narrow?
9. Can a histogram have gaps between bars?
10. Density histogram: area of a bar = ?
11. Stem-and-leaf vs. histogram: advantage?
12. Normal distribution: shape?
Show Answer Key

1. The vertical axis (y-axis)

2. $\lceil 1+\log_2 50 \rceil = \lceil 6.64 \rceil = 7$

3. $\lceil \sqrt{64} \rceil = 8$

4. Bimodal

5. Right-skewed (few very high earners)

6. 1 (100%)

7. Detail is lost; distribution looks flat

8. Too noisy; bars are ragged

9. No — only if a bin truly has zero frequency (empty bin)

10. Proportion (relative frequency) in that bin

11. Preserves individual data values

12. Symmetric, bell-shaped