Python Scientific Stack — NumPy, Matplotlib & Pandas
Python Scientific Stack — NumPy, Matplotlib & Pandas
Python's open-source libraries form a powerful toolkit for scientific computing and visualization:
- NumPy — N-dimensional arrays and vectorized math; the foundation for all scientific Python.
- Matplotlib — comprehensive 2-D and 3-D plotting library (modeled after MATLAB's plotting API).
- Pandas — tabular data manipulation with
DataFrameandSeriesobjects. - SciPy — integration, optimization, interpolation, signal processing, linear algebra.
- Seaborn — statistical visualization built on Matplotlib with intelligent defaults.
- Plotly — interactive, web-based charts.
- ndarray — the N-dimensional array. Created with
np.array([1,2,3]). - Shape —
a.shapereturns dimensions, e.g.(3,)for a vector,(2,3)for a 2×3 matrix. - arange —
np.arange(start, stop, step)— evenly spaced values (excludes stop). - linspace —
np.linspace(a, b, n)— $n$ points from $a$ to $b$ inclusive. - Broadcasting — NumPy automatically aligns shapes for element-wise operations.
- Vectorized ops —
np.sin(x),x**2,x * yall operate element-wise without loops.
| Function | Purpose |
|---|---|
plt.plot(x, y) | Line plot |
plt.scatter(x, y) | Scatter plot |
plt.bar(x, heights) | Bar chart |
plt.hist(data, bins) | Histogram |
plt.xlabel(), plt.ylabel(), plt.title() | Labels |
plt.subplot(r, c, i) | Subplots in a grid |
plt.savefig('file.png', dpi=300) | Save to file |
plt.show() | Display figure |
- Series — a labeled 1-D array.
s = pd.Series([10, 20, 30], index=['a','b','c']) - DataFrame — a labeled 2-D table.
df = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]}) - Read data —
pd.read_csv('file.csv') - Descriptive stats —
df.describe()returns count, mean, std, min, quartiles, max. - Grouping —
df.groupby('category')['value'].mean() - Plotting shortcut —
df.plot(x='col1', y='col2', kind='scatter')
Plot $y = x^2 e^{-x}$ for $0 \le x \le 8$ with labeled axes.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 8, 300)
y = x**2 * np.exp(-x)
plt.plot(x, y, 'r-', linewidth=2)
plt.xlabel('x'); plt.ylabel('y'); plt.title(r'$y = x^2 e^{-x}$')
plt.grid(True); plt.show()
Variables: x — 300-point array $[0, 8]$; y — element-wise $x^2 e^{-x}$; 'r-' — red solid line.
Load a CSV of exam scores and compute summary statistics.
import pandas as pd
df = pd.read_csv('scores.csv')
print(df.describe())
Output: count, mean ($\bar{x}$), std ($s$), min, 25%, 50% (median), 75%, max for each numeric column.
Variables: df — DataFrame; describe() — computes $n, \bar{x}, s, Q_1, \tilde{x}, Q_3$.
Show $\sin(x)$ and $\cos(x)$ side by side.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,4))
x = np.linspace(0, 2*np.pi, 200)
ax1.plot(x, np.sin(x), 'b'); ax1.set_title('sin(x)')
ax2.plot(x, np.cos(x), 'r'); ax2.set_title('cos(x)')
plt.tight_layout(); plt.show()
Variables: fig — figure container; ax1, ax2 — subplot axes; figsize — width, height in inches.
Practice Problems
np.linspace(0, 1, 50) return?np.arange different from np.linspace?df.describe() return?geom_point-style plotting in Python?Show Answer Key
1. A 1-D array of 50 evenly spaced values from 0 to 1 (inclusive)
2. arange uses a step size and excludes the endpoint; linspace uses a count of points and includes the endpoint
3. Operations applied element-wise to entire arrays without explicit Python loops
4. np.array([[1,2,3],[4,5,6]]) or np.zeros((2,3))
5. plt.savefig('file.png', dpi=300)
6. A 2-D labeled tabular data structure with named columns and an index
7. pd.read_csv('filename.csv')
8. Count, mean, std, min, 25th/50th/75th percentiles, max for each numeric column
9. fig, (ax1, ax2) = plt.subplots(1, 2)
10. Seaborn (or Plotly)
11. NumPy automatically aligns arrays of different shapes for element-wise operations
12. Plotly and Bokeh (also: Altair, Dash)