Day 10: Matplotlib & Seaborn — Visualization Basics
Goal: Learn how to make clear, informative plots using matplotlib
and seaborn
. By the end you will know what plot to choose, where to use it, and how to produce it with clean code. This guide is written simply so even beginners/weak students can follow step-by-step.

Why visualization matters
Visualization converts numbers into pictures that humans can understand quickly. For Data Science & ML:
- Spot trends, patterns, and outliers quickly.
- Communicate results to non-technical stakeholders.
- Diagnose data problems (missing values, wrong scales).
- Support feature selection and model explanation.
What are matplotlib
and seaborn
?
- Matplotlib: The foundational plotting library in Python. Very flexible — draws lines, bars, histograms, scatter plots, etc. (low-level API)
- Seaborn: Built on top of Matplotlib. High-level interface for attractive statistical plots (heatmaps, violin, boxplots, pairplots). Easier to use for common tasks.
Use Matplotlib when you need full control. Use Seaborn for quick, polished statistical visualizations.
Setup — Installation & imports
Install with pip (run in terminal):
pip install matplotlib seaborn pandas numpy
Typical imports in a Jupyter notebook or script:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Optional: set seaborn style for nicer default look
sns.set(style="whitegrid") # or "darkgrid", "ticks", "white"
Basic workflow — 4 simple steps to create any plot
- Prepare data: Clean, select columns, handle NA.
- Choose plot: Decide which plot suits your question (see plots section).
- Create figure & axes: Set figure size and call the plotting function.
- Customize & save: Add title, axis labels, legend, then save or show.
# Example skeleton
fig, ax = plt.subplots(figsize=(8,5))
ax.plot(x, y, label="value")
ax.set_title("Title")
ax.set_xlabel("X label")
ax.set_ylabel("Y label")
ax.legend()
plt.tight_layout()
plt.show()
Common plots — what, where to use, how, why
1. Line plot — trend over continuous variable
Where to use: Time series, sequences, trend analysis.
How to use (Matplotlib):
# Sample line plot
x = np.arange(0, 10, 0.5)
y = np.sin(x)
plt.figure(figsize=(8,4))
plt.plot(x, y, marker='o', linestyle='-', linewidth=1)
plt.title("Sine wave example")
plt.xlabel("X")
plt.ylabel("sin(X)")
plt.grid(True)
plt.show()
Why: Shows how a value changes smoothly over a continuous axis.
2. Scatter plot — relationship between two variables
Where to use: Explore correlation, clusters, and outliers between two numeric variables.
How to use (Seaborn):
# Scatter with seaborn
df = pd.DataFrame({
"height": np.random.normal(170, 10, 100),
"weight": np.random.normal(70, 12, 100),
})
plt.figure(figsize=(7,5))
sns.scatterplot(data=df, x="height", y="weight")
plt.title("Height vs Weight")
plt.show()
Why: Simple visual check for correlation and cluster patterns.
3. Bar chart — categorical comparisons
Where to use: Compare categories (e.g., sales by region).
How to use (Matplotlib):
categories = ['East','West','North','South']
values = [1200, 900, 1500, 700]
plt.figure(figsize=(7,4))
plt.bar(categories, values)
plt.title("Sales by Region")
plt.ylabel("Sales")
plt.show()
Why: Easy to compare magnitudes between categories.
4. Histogram — distribution of a numeric variable
Where to use: Understand distribution (skewness, modality), detect outliers.
How to use (Seaborn/Matplotlib):
data = np.random.normal(50, 12, 1000)
plt.figure(figsize=(7,4))
sns.histplot(data, bins=30, kde=True) # kde=True overlays smooth density
plt.title("Distribution of variable")
plt.xlabel("Value")
plt.show()
Why: Quickly reveals shape of the data and spread.
5. Boxplot & Violin — distribution + summary statistics
Where to use: Compare distributions across groups (median, IQR, outliers).
How to use (Seaborn):
# Boxplot
plt.figure(figsize=(8,5))
sns.boxplot(x="category", y="value", data=pd.DataFrame({
"category": np.repeat(['A','B','C'], 100),
"value": np.concatenate([
np.random.normal(50,8,100),
np.random.normal(55,9,100),
np.random.normal(45,10,100)
])
}))
plt.title("Boxplot by Category")
plt.show()
Why: Boxplot shows median and spread; violin shows density shape as well.
6. Heatmap — matrix visualization (correlation, confusion matrix)
Where to use: Correlation matrix, confusion matrices, any grid-like values.
How to use (Seaborn):
corr = df.corr() # df is a DataFrame with numeric columns
plt.figure(figsize=(8,6))
sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm", square=True)
plt.title("Correlation Matrix")
plt.show()
Why: Visualizes strength and sign of variable relationships quickly.
7. Pairplot — multi-variable quick view
Where to use: Small datasets to view pairwise relationships and distributions.
How to use (Seaborn):
sns.pairplot(df[['col1','col2','col3']]) # opens grid of scatter + histograms
plt.show()
Why: One-shot overview of relationships and marginal distributions between multiple variables.
Customization tips (labels, size, saving)
Task | Code snippet | Why/Tip |
---|---|---|
Set figure size |
|
Controls output size for blog images or slides. |
Add title & labels |
|
Always label axes so readers understand the units/context. |
Save figure |
|
Use dpi=300 for crisp images in posts. Use bbox_inches='tight' if labels get cut off. |
Legend and colors |
|
Place legend to avoid covering data. Seaborn has color palettes: palette='muted' . |
Mini Exercises — Practice (step-by-step)
- Exercise 1: Create a line plot of daily temperatures (synthetic). Add moving average (7-day) using
rolling()
. - Exercise 2: Using a small dataset (50 rows), draw a scatter plot and calculate Pearson correlation & show it in the title.
- Exercise 3: Plot histograms for multiple numeric columns using a loop &
plt.subplot()
.
# Example: moving average on time series
dates = pd.date_range(start="2024-01-01", periods=60)
temps = 20 + 5*np.sin(np.linspace(0,6,60)) + np.random.normal(0,1,60)
ts = pd.Series(temps, index=dates)
plt.figure(figsize=(10,4))
plt.plot(ts.index, ts.values, label="Daily temp")
plt.plot(ts.index, ts.rolling(window=7).mean(), label="7-day MA", linewidth=2)
plt.title(f"Temperature with 7-day MA (corr={ts.corr(ts.rolling(7).mean()):.2f})")
plt.legend()
plt.show()
Summary & Key Takeaways
- Matplotlib = powerful base library (low-level control). Seaborn = nicer defaults + statistical plots.
- Choose the right plot for the question: trend (line), relationship (scatter), distribution (histogram/box), categories (bar), correlation (heatmap).
- Always label axes, add titles, and provide textual explanation below each image (important for SEO & AdSense).
- Save images with appropriate resolution and include alt text when uploading.