R — Statistical Computing & Graphics
R — Statistical Computing & Graphics
R is a free, open-source programming language and environment designed for statistical computing and graphics. It is widely used in academia, biostatistics, data science, and machine learning. CRAN (Comprehensive R Archive Network) hosts over 20 000 add-on packages.
- Numeric:
x <- 3.14— a decimal number.<-is the assignment operator. - Integer:
n <- 5L— theLsuffix forces integer type. - Character:
name <- "voltage"— a text string. - Logical:
TRUE,FALSE— boolean values. - Vector:
v <- c(1, 4, 9, 16)— the fundamental data structure;c()combines elements. - Data frame:
df <- data.frame(x=1:5, y=c(2,4,6,8,10))— a table of columns. - Factor: categorical variable, e.g.
factor(c("A","B","A")).
| Function | Purpose | Example |
|---|---|---|
mean(x) | Arithmetic mean $\bar{x} = \frac{1}{n}\sum x_i$ | mean(c(2,4,6)) → 4 |
sd(x) | Sample std dev $s = \sqrt{\frac{\sum(x_i-\bar{x})^2}{n-1}}$ | sd(c(2,4,6)) → 2 |
cor(x, y) | Pearson $r$ | cor(x, y) |
lm(y ~ x) | Linear model (regression) | fit <- lm(y ~ x, data=df) |
t.test(x) | One-sample $t$-test | t.test(x, mu=0) |
summary() | Summary statistics or model summary | summary(fit) |
plot(x, y)— scatter/line plot. x, y: numeric vectors.hist(x, breaks=n)— histogram. breaks: number of bins.boxplot(x ~ group)— box plots by group.barplot(heights)— bar chart.abline(fit)— add regression line to current plot.
The ggplot2 package builds plots in layers:
ggplot(data, aes(x=var1, y=var2)) + geom_point() + geom_smooth(method="lm")
aes()— maps variables to aesthetics (x, y, color, size).geom_point()— scatter points.geom_line()— line chart.geom_histogram()— histogram.geom_boxplot()— box plot.geom_smooth()— fitted trend with confidence band.facet_wrap(~var)— small multiples by a grouping variable.
Fit a regression line to study hours vs. exam score.
hours <- c(1, 2, 3, 4, 5)
score <- c(50, 55, 65, 70, 80)
fit <- lm(score ~ hours)
summary(fit)
Output includes: $\hat{y} = 40 + 7.5x$, $R^2 = 0.974$, $p < 0.01$.
Variables: hours — predictor vector; score — response vector; fit — fitted model object.
Create a scatter plot of mpg vs. weight from the mtcars dataset with a regression line.
library(ggplot2)
ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point(color="steelblue") +
geom_smooth(method="lm", se=TRUE) +
labs(title="MPG vs Weight", x="Weight (1000 lb)", y="MPG")
Variables: wt — car weight; mpg — miles per gallon; se=TRUE — show confidence band.
Test if the mean reaction time is significantly different from 250 ms.
times <- c(243, 251, 260, 238, 255, 247)
t.test(times, mu=250)
Output: $t = -0.24$, $p = 0.82$. We fail to reject $H_0$. The mean is not significantly different from 250 ms.
Variables: times — sample data; mu — hypothesized population mean; p — p-value.
Practice Problems
c() stand for?aes() do?facet_wrap do?t.test(x, mu=5) perform?Show Answer Key
1. <- (also = works, but <- is conventional)
2. v <- 1:10 or v <- c(1,2,3,4,5,6,7,8,9,10)
3. Combine (concatenate elements into a vector)
4. sd(x)
5. lm() (linear model)
6. Maps data variables to visual aesthetics (x, y, color, size, etc.)
7. abline(lm(y ~ x))
8. Creates small multiples — the same plot repeated for each level of a variable
9. Comprehensive R Archive Network — the central repository for R packages
10. hist(x) or hist(x, breaks=10)
11. One-sample $t$-test: tests whether the population mean equals 5
12. A tabular data structure with named columns (like a spreadsheet or SQL table)