Scatter Plots, Correlation & Trend Lines
Scatter Plots, Correlation & Trend Lines
A graph of paired observations $(x_i, y_i)$. Each point represents one case. Used to detect direction, form, and strength of a relationship.
$$r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i-\bar{x})^2 \sum (y_i-\bar{y})^2}}$$
$-1 \leq r \leq 1$. Measures linear association. $|r|$ near 1 = strong; near 0 = weak.
$$\hat{y} = a + bx, \quad b = r\frac{s_y}{s_x}, \quad a = \bar{y} - b\bar{x}$$
Minimizes the sum of squared residuals.
$$r^2 = \text{proportion of variance in } y \text{ explained by } x$$
Study hours: $(1,50), (2,55), (3,65), (4,70), (5,80)$. Find $r$.
$\bar{x}=3$, $\bar{y}=64$. Numerator $= (-2)(-14)+(-1)(-9)+0(1)+(1)(6)+(2)(16) = 28+9+0+6+32 = 75$.
$\sum(x_i-\bar{x})^2 = 10$, $\sum(y_i-\bar{y})^2 = 533$.
$r = 75/\sqrt{10 \cdot 533} = 75/73.0 \approx 0.987$. Strong positive.
Using Example 1: find the regression line.
$s_x = \sqrt{10/4} = 1.581$, $s_y = \sqrt{533/4} = 11.543$.
$b = 0.987 \cdot 11.543/1.581 = 7.5$. $a = 64 - 7.5(3) = 41.5$.
$\hat{y} = 41.5 + 7.5x$.
$r^2 = 0.81$. Interpret.
81% of the variability in $y$ is explained by the linear relationship with $x$.
Practice Problems
Show Answer Key
1. Strong negative linear relationship
2. No — just no linear relationship; could be nonlinear
3. No
4. $y_i - \hat{y}_i$ (observed minus predicted)
5. $(\bar{x}, \bar{y})$
6. $0.36$ (36% of variability explained)
7. Can inflate or deflate $r$ substantially
8. Whether a linear model is appropriate (look for patterns)
9. Outliers and non-linearity
10. $\hat{y} = 10+18 = 28$
11. Predicting beyond the range of observed $x$ — risky
12. No, $r$ only measures linear association