Training Data Visualization Python Scientific Stack — NumPy, Matplotlib & Pandas
9 / 10

Python Scientific Stack — NumPy, Matplotlib & Pandas

24 min Data Visualization

Python Scientific Stack — NumPy, Matplotlib & Pandas

The Python Scientific Ecosystem

Python's open-source libraries form a powerful toolkit for scientific computing and visualization:

  • NumPy — N-dimensional arrays and vectorized math; the foundation for all scientific Python.
  • Matplotlib — comprehensive 2-D and 3-D plotting library (modeled after MATLAB's plotting API).
  • Pandas — tabular data manipulation with DataFrame and Series objects.
  • SciPy — integration, optimization, interpolation, signal processing, linear algebra.
  • Seaborn — statistical visualization built on Matplotlib with intelligent defaults.
  • Plotly — interactive, web-based charts.
NumPy — Key Concepts
  • ndarray — the N-dimensional array. Created with np.array([1,2,3]).
  • Shapea.shape returns dimensions, e.g. (3,) for a vector, (2,3) for a 2×3 matrix.
  • arangenp.arange(start, stop, step) — evenly spaced values (excludes stop).
  • linspacenp.linspace(a, b, n) — $n$ points from $a$ to $b$ inclusive.
  • Broadcasting — NumPy automatically aligns shapes for element-wise operations.
  • Vectorized opsnp.sin(x), x**2, x * y all operate element-wise without loops.
Matplotlib — Key Functions
FunctionPurpose
plt.plot(x, y)Line plot
plt.scatter(x, y)Scatter plot
plt.bar(x, heights)Bar chart
plt.hist(data, bins)Histogram
plt.xlabel(), plt.ylabel(), plt.title()Labels
plt.subplot(r, c, i)Subplots in a grid
plt.savefig('file.png', dpi=300)Save to file
plt.show()Display figure
Pandas — Key Concepts
  • Series — a labeled 1-D array. s = pd.Series([10, 20, 30], index=['a','b','c'])
  • DataFrame — a labeled 2-D table. df = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]})
  • Read datapd.read_csv('file.csv')
  • Descriptive statsdf.describe() returns count, mean, std, min, quartiles, max.
  • Groupingdf.groupby('category')['value'].mean()
  • Plotting shortcutdf.plot(x='col1', y='col2', kind='scatter')
Example 1 — Plot a Function

Plot $y = x^2 e^{-x}$ for $0 \le x \le 8$ with labeled axes.

import numpy as np

import matplotlib.pyplot as plt

x = np.linspace(0, 8, 300)

y = x**2 * np.exp(-x)

plt.plot(x, y, 'r-', linewidth=2)

plt.xlabel('x'); plt.ylabel('y'); plt.title(r'$y = x^2 e^{-x}$')

plt.grid(True); plt.show()

Variables: x — 300-point array $[0, 8]$; y — element-wise $x^2 e^{-x}$; 'r-' — red solid line.

Example 2 — Pandas Descriptive Stats

Load a CSV of exam scores and compute summary statistics.

import pandas as pd

df = pd.read_csv('scores.csv')

print(df.describe())

Output: count, mean ($\bar{x}$), std ($s$), min, 25%, 50% (median), 75%, max for each numeric column.

Variables: df — DataFrame; describe() — computes $n, \bar{x}, s, Q_1, \tilde{x}, Q_3$.

Example 3 — Subplots

Show $\sin(x)$ and $\cos(x)$ side by side.

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,4))

x = np.linspace(0, 2*np.pi, 200)

ax1.plot(x, np.sin(x), 'b'); ax1.set_title('sin(x)')

ax2.plot(x, np.cos(x), 'r'); ax2.set_title('cos(x)')

plt.tight_layout(); plt.show()

Variables: fig — figure container; ax1, ax2 — subplot axes; figsize — width, height in inches.

Practice Problems

1. What does np.linspace(0, 1, 50) return?
2. How is np.arange different from np.linspace?
3. What does vectorized operation mean?
4. How do you create a 2×3 matrix in NumPy?
5. What command saves a Matplotlib figure to a PNG file?
6. What is a Pandas DataFrame?
7. How do you read a CSV file in Pandas?
8. What does df.describe() return?
9. How do you create two subplots side by side?
10. What library provides geom_point-style plotting in Python?
11. What is broadcasting in NumPy?
12. Name two Python libraries for interactive charts.
Show Answer Key

1. A 1-D array of 50 evenly spaced values from 0 to 1 (inclusive)

2. arange uses a step size and excludes the endpoint; linspace uses a count of points and includes the endpoint

3. Operations applied element-wise to entire arrays without explicit Python loops

4. np.array([[1,2,3],[4,5,6]]) or np.zeros((2,3))

5. plt.savefig('file.png', dpi=300)

6. A 2-D labeled tabular data structure with named columns and an index

7. pd.read_csv('filename.csv')

8. Count, mean, std, min, 25th/50th/75th percentiles, max for each numeric column

9. fig, (ax1, ax2) = plt.subplots(1, 2)

10. Seaborn (or Plotly)

11. NumPy automatically aligns arrays of different shapes for element-wise operations

12. Plotly and Bokeh (also: Altair, Dash)