Day 10: Matplotlib & Seaborn — Visualization Basics | Python ML & Data Science

Day 10: Matplotlib & Seaborn — Visualization Basics

Goal: Learn how to make clear, informative plots using matplotlib and seaborn. By the end you will know what plot to choose, where to use it, and how to produce it with clean code. This guide is written simply so even beginners/weak students can follow step-by-step.

Matplotlib and Seaborn visualization basics with Python charts and graphs

Why visualization matters

Visualization converts numbers into pictures that humans can understand quickly. For Data Science & ML:

  • Spot trends, patterns, and outliers quickly.
  • Communicate results to non-technical stakeholders.
  • Diagnose data problems (missing values, wrong scales).
  • Support feature selection and model explanation.

What are matplotlib and seaborn?

  • Matplotlib: The foundational plotting library in Python. Very flexible — draws lines, bars, histograms, scatter plots, etc. (low-level API)
  • Seaborn: Built on top of Matplotlib. High-level interface for attractive statistical plots (heatmaps, violin, boxplots, pairplots). Easier to use for common tasks.

Use Matplotlib when you need full control. Use Seaborn for quick, polished statistical visualizations.


Setup — Installation & imports

Install with pip (run in terminal):

pip install matplotlib seaborn pandas numpy

Typical imports in a Jupyter notebook or script:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Optional: set seaborn style for nicer default look
sns.set(style="whitegrid")   # or "darkgrid", "ticks", "white"

Basic workflow — 4 simple steps to create any plot

  1. Prepare data: Clean, select columns, handle NA.
  2. Choose plot: Decide which plot suits your question (see plots section).
  3. Create figure & axes: Set figure size and call the plotting function.
  4. Customize & save: Add title, axis labels, legend, then save or show.
# Example skeleton
fig, ax = plt.subplots(figsize=(8,5))
ax.plot(x, y, label="value")
ax.set_title("Title")
ax.set_xlabel("X label")
ax.set_ylabel("Y label")
ax.legend()
plt.tight_layout()
plt.show()

Common plots — what, where to use, how, why

1. Line plot — trend over continuous variable

Where to use: Time series, sequences, trend analysis.

How to use (Matplotlib):

# Sample line plot
x = np.arange(0, 10, 0.5)
y = np.sin(x)

plt.figure(figsize=(8,4))
plt.plot(x, y, marker='o', linestyle='-', linewidth=1)
plt.title("Sine wave example")
plt.xlabel("X")
plt.ylabel("sin(X)")
plt.grid(True)
plt.show()

Why: Shows how a value changes smoothly over a continuous axis.


2. Scatter plot — relationship between two variables

Where to use: Explore correlation, clusters, and outliers between two numeric variables.

How to use (Seaborn):

# Scatter with seaborn
df = pd.DataFrame({
    "height": np.random.normal(170, 10, 100),
    "weight": np.random.normal(70, 12, 100),
})

plt.figure(figsize=(7,5))
sns.scatterplot(data=df, x="height", y="weight")
plt.title("Height vs Weight")
plt.show()

Why: Simple visual check for correlation and cluster patterns.


3. Bar chart — categorical comparisons

Where to use: Compare categories (e.g., sales by region).

How to use (Matplotlib):

categories = ['East','West','North','South']
values = [1200, 900, 1500, 700]

plt.figure(figsize=(7,4))
plt.bar(categories, values)
plt.title("Sales by Region")
plt.ylabel("Sales")
plt.show()

Why: Easy to compare magnitudes between categories.


4. Histogram — distribution of a numeric variable

Where to use: Understand distribution (skewness, modality), detect outliers.

How to use (Seaborn/Matplotlib):

data = np.random.normal(50, 12, 1000)

plt.figure(figsize=(7,4))
sns.histplot(data, bins=30, kde=True)   # kde=True overlays smooth density
plt.title("Distribution of variable")
plt.xlabel("Value")
plt.show()

Why: Quickly reveals shape of the data and spread.


5. Boxplot & Violin — distribution + summary statistics

Where to use: Compare distributions across groups (median, IQR, outliers).

How to use (Seaborn):

# Boxplot
plt.figure(figsize=(8,5))
sns.boxplot(x="category", y="value", data=pd.DataFrame({
    "category": np.repeat(['A','B','C'], 100),
    "value": np.concatenate([
        np.random.normal(50,8,100),
        np.random.normal(55,9,100),
        np.random.normal(45,10,100)
    ])
}))
plt.title("Boxplot by Category")
plt.show()

Why: Boxplot shows median and spread; violin shows density shape as well.


6. Heatmap — matrix visualization (correlation, confusion matrix)

Where to use: Correlation matrix, confusion matrices, any grid-like values.

How to use (Seaborn):

corr = df.corr()   # df is a DataFrame with numeric columns
plt.figure(figsize=(8,6))
sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm", square=True)
plt.title("Correlation Matrix")
plt.show()

Why: Visualizes strength and sign of variable relationships quickly.


7. Pairplot — multi-variable quick view

Where to use: Small datasets to view pairwise relationships and distributions.

How to use (Seaborn):

sns.pairplot(df[['col1','col2','col3']])  # opens grid of scatter + histograms
plt.show()

Why: One-shot overview of relationships and marginal distributions between multiple variables.


Customization tips (labels, size, saving)

TaskCode snippetWhy/Tip
Set figure size
plt.figure(figsize=(10,5))
Controls output size for blog images or slides.
Add title & labels
plt.title("My Plot")
plt.xlabel("X label")
plt.ylabel("Y label")
Always label axes so readers understand the units/context.
Save figure
plt.tight_layout()
plt.savefig("plot.png", dpi=300)   # high quality for blog/print
Use dpi=300 for crisp images in posts. Use bbox_inches='tight' if labels get cut off.
Legend and colors
plt.plot(x,y,label="line")
plt.legend(loc='best')
Place legend to avoid covering data. Seaborn has color palettes: palette='muted'.
Matplotlib and Seaborn visualization basics with Python charts and graphs

Mini Exercises — Practice (step-by-step)

  1. Exercise 1: Create a line plot of daily temperatures (synthetic). Add moving average (7-day) using rolling().
  2. Exercise 2: Using a small dataset (50 rows), draw a scatter plot and calculate Pearson correlation & show it in the title.
  3. Exercise 3: Plot histograms for multiple numeric columns using a loop & plt.subplot().
# Example: moving average on time series
dates = pd.date_range(start="2024-01-01", periods=60)
temps = 20 + 5*np.sin(np.linspace(0,6,60)) + np.random.normal(0,1,60)
ts = pd.Series(temps, index=dates)

plt.figure(figsize=(10,4))
plt.plot(ts.index, ts.values, label="Daily temp")
plt.plot(ts.index, ts.rolling(window=7).mean(), label="7-day MA", linewidth=2)
plt.title(f"Temperature with 7-day MA (corr={ts.corr(ts.rolling(7).mean()):.2f})")
plt.legend()
plt.show()

Summary & Key Takeaways

  • Matplotlib = powerful base library (low-level control). Seaborn = nicer defaults + statistical plots.
  • Choose the right plot for the question: trend (line), relationship (scatter), distribution (histogram/box), categories (bar), correlation (heatmap).
  • Always label axes, add titles, and provide textual explanation below each image (important for SEO & AdSense).
  • Save images with appropriate resolution and include alt text when uploading.

एक टिप्पणी भेजें

और नया पुराने

نموذج الاتصال