Day 7: Pandas Series & DataFrame Basics – Python Class for ML & Data Science

📘 Day 7 — Pandas Series & DataFrame Basics (Hindi + English)

Easy input → output Python examples, simple explanations, and downloadable diagrams (SVG/PNG). Paste into Blogger (HTML mode). Title removed as requested.

Table of Contents

Introduction to Pandas
Pandas Series
Pandas DataFrame
Useful DataFrame Functions (Quick EDA)
Real Dataset Hands-On (Titanic)
Diagrams & Downloads
Homework / Practice

1) 🔹 Introduction to Pandas — क्या है, क्यों ज़रूरी

  क्या है? Pandas एक Python library है जो table-like data (CSV, Excel, SQL) को आसानी से read, manipulate और analyze करने देती है.

  क्यों जरूरी? Machine Learning में data cleaning और transformation का बड़ा हिस्सा होता है — Pandas ये काम आसान और fast बना देता है.

  कैसे शुरू करें? Install & import नीचे देखें।

Install / Import (Input → Output style)

# Input (run in terminal or notebook)
!pip install pandas

# Input (in Python)
import pandas as pd
import numpy as np
print('pandas version:', pd.__version__)

# Output (example)
# pandas version: 2.2.0

2) 🔹 Pandas Series — क्या है (1D)

Series = 1D labeled array → index + values. Example: daily temperature list with date labels.

2.1 Create Series — 5 Examples (Input → Output)

हर example में पहले input दिखाया गया है, फिर expected/typical output.

# Example 1 — From List
# Input:
s = pd.Series([10,20,30])
print(s)
# Output:
# 0    10
# 1    20
# 2    30
# dtype: int64

# Example 2 — With custom index (labels)
# Input:
s = pd.Series([100,200,300], index=['a','b','c'])
print(s)
# Output:
# a    100
# b    200
# c    300
# dtype: int64

# Example 3 — From dictionary
# Input:
s = pd.Series({'x':5,'y':10,'z':15})
print(s)
# Output:
# x     5
# y    10
# z    15
# dtype: int64

# Example 4 — From scalar (same value for each index)
# Input:
s = pd.Series(7, index=[0,1,2,3])
print(s)
# Output:
# 0    7
# 1    7
# 2    7
# 3    7
# dtype: int64

# Example 5 — From NumPy array
# Input:
arr = np.array([1,2,3,4])
s = pd.Series(arr)
print(s)
# Output:
# 0    1
# 1    2
# 2    3
# 3    4
# dtype: int64

2.2 Accessing Elements — practical examples

Use indexing, slicing, labels and boolean filters.

# Input:
s = pd.Series([5,10,15,20], index=['a','b','c','d'])

# By position
print('s[0] ->', s[0])
# Output: s[0] -> 5

# By label
print(\"s['c'] ->\", s['c'])
# Output: s['c'] -> 15

# Slice
print('s[1:3] ->\\n', s[1:3])
# Output:
# s[1:3] ->
# b    10
# c    15
# dtype: int64

# Boolean filter
print('s[s > 10] ->\\n', s[s > 10])
# Output:
# c    15
# d    20
# dtype: int64

2.3 Series Operations — vectorized & aligned ops

# Input:
s = pd.Series([2,4,6])

# Scalar op
print('s*3 ->', s*3)
# Output: s*3 -> 0    6 1   12 2   18 dtype: int64

# Add two Series (alignment by index)
s1 = pd.Series([1,2,3], index=['a','b','c'])
s2 = pd.Series([10,20,30], index=['b','c','d'])
print('s1 + s2 ->\\n', s1 + s2)
# Output:
# a     NaN
# b    12.0
# c    23.0
# d     NaN
# dtype: float64

# Use numpy functions
print('np.sqrt(s) ->', np.sqrt(s))
# Output: np.sqrt(s) -> [1.41421356 2.         2.44948974]

3) 🔹 Pandas DataFrame — 2D table

DataFrame = rows (records) + columns (features). Real world datasets mostly DataFrame format.

3.1 Creating DataFrame — 5 Examples (Input → Output)

# Example 1 — From dictionary
# Input:
data = {'Name':['Aman','Ravi','Priya'],'Age':[22,25,24],'Marks':[85,78,92]}
df = pd.DataFrame(data)
print(df)
# Output:
#     Name  Age  Marks
# 0   Aman   22     85
# 1   Ravi   25     78
# 2  Priya   24     92

# Example 2 — From list of dicts
# Input:
data = [{'a':1,'b':2},{'a':3,'b':4}]
df = pd.DataFrame(data)
print(df)
# Output:
#    a  b
# 0  1  2
# 1  3  4

# Example 3 — From list of lists (with columns)
# Input:
data = [[1,'Aman'],[2,'Ravi']]
df = pd.DataFrame(data, columns=['ID','Name'])
print(df)
# Output:
#    ID  Name
# 0   1  Aman
# 1   2  Ravi

# Example 4 — From NumPy array
# Input:
arr = np.arange(6).reshape(2,3)
df = pd.DataFrame(arr, columns=['A','B','C'])
print(df)
# Output:
#    A  B  C
# 0  0  1  2
# 1  3  4  5

# Example 5 — From Series
# Input:
s1 = pd.Series([1,2,3])
s2 = pd.Series([4,5,6])
df = pd.DataFrame({'col1':s1,'col2':s2})
print(df)
# Output:
#    col1  col2
# 0     1     4
# 1     2     5
# 2     3     6

3.2 Selecting Data — practical cases

# Input:
df = pd.DataFrame({'Name':['A','B','C'],'Age':[20,21,22],'Marks':[80,85,90]})

# Single column
print('df[\"Name\"] ->\\n', df['Name'])
# Output:
# 0    A
# 1    B
# 2    C
# Name: Name, dtype: object

# Multiple columns
print('df[[\"Name\",\"Marks\"]] ->\\n', df[['Name','Marks']])
# Output: table with Name and Marks columns

# Row by label (loc)
print('df.loc[1] ->\\n', df.loc[1])
# Output: Series for row index 1

# Row by position (iloc)
print('df.iloc[2] ->\\n', df.iloc[2])
# Output: Series for third row

# Conditional selection
print('df[df[\"Marks\"] > 80] ->\\n', df[df['Marks'] > 80])
# Output: rows where Marks > 80 (rows 1 and 2)

3.3 Modifying DataFrame — add/update/drop

# Input:
df = pd.DataFrame({'Name':['A','B','C'],'Marks':[70,80,90]})

# Add column
df['Passed'] = df['Marks'] > 75
print('After add ->\\n', df)
# Output: column 'Passed' with True/False

# Update value
df.loc[0,'Marks'] = 75
print('After update ->\\n', df)

# Drop column
df = df.drop('Passed', axis=1)
print('After drop ->\\n', df)

# Add row
df.loc[3] = ['D', 65]
print('After add row ->\\n', df)

# Drop row
df = df.drop(3, axis=0)
print('After drop row ->\\n', df)

4) 🔹 Useful DataFrame Functions (Quick EDA)

Data samajhne के लिए common commands: head(), info(), describe(), value_counts().

# Example EDA (Input -> Output)
print(df.head())      # first 5 rows
print(df.shape)       # (rows, cols)
print(df.info())      # dtypes & non-null counts
print(df.describe())  # numerical summary
# Output: respectively head table, shape tuple, info text, describe table

5) 🔹 Real Dataset Hands-On — Titanic (practical)

Step-by-step: load CSV → inspect → simple analysis.

# Input:
titanic = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')
print(titanic.head())

# Example 1 — Average Age
print('Average Age ->', titanic['Age'].mean())
# Example 2 — Total Survived
print('Total Survived ->', titanic['Survived'].sum())
# Example 3 — Male vs Female count
print('Sex counts ->\\n', titanic['Sex'].value_counts())
# Example 4 — Top fares
print(titanic.sort_values('Fare', ascending=False)[['Name','Fare']].head())
# Example 5 — Survival rate by Sex
print('Survival Rate by Sex ->\\n', titanic.groupby('Sex')['Survived'].mean())

# Output: See printed tables & numbers in console / notebook

6) 🎯 Homework / Practice

Series बनाइए — IPL scores — Find max, min और average (input→output).
DataFrame बनाइए — 10 students (Name, Age, Marks) — Find top 3 marks and average marks.
Titanic practice: Find minimum age, avg fare by Pclass, survival rate for children (<18).