📘 Day 7 — Pandas Series & DataFrame Basics (Hindi + English)
Easy input → output Python examples, simple explanations, and downloadable diagrams (SVG/PNG). Paste into Blogger (HTML mode). Title removed as requested.
Table of Contents
1) 🔹 Introduction to Pandas — क्या है, क्यों ज़रूरी
क्या है? Pandas एक Python library है जो table-like data (CSV, Excel, SQL) को आसानी से read, manipulate और analyze करने देती है.
क्यों जरूरी? Machine Learning में data cleaning और transformation का बड़ा हिस्सा होता है — Pandas ये काम आसान और fast बना देता है.
कैसे शुरू करें? Install & import नीचे देखें।
क्यों जरूरी? Machine Learning में data cleaning और transformation का बड़ा हिस्सा होता है — Pandas ये काम आसान और fast बना देता है.
कैसे शुरू करें? Install & import नीचे देखें।
Install / Import (Input → Output style)
# Input (run in terminal or notebook)
!pip install pandas
# Input (in Python)
import pandas as pd
import numpy as np
print('pandas version:', pd.__version__)
# Output (example)
# pandas version: 2.2.0
2) 🔹 Pandas Series — क्या है (1D)
Series = 1D labeled array → index + values. Example: daily temperature list with date labels.
2.1 Create Series — 5 Examples (Input → Output)
हर example में पहले input दिखाया गया है, फिर expected/typical output.
# Example 1 — From List
# Input:
s = pd.Series([10,20,30])
print(s)
# Output:
# 0 10
# 1 20
# 2 30
# dtype: int64
# Example 2 — With custom index (labels)
# Input:
s = pd.Series([100,200,300], index=['a','b','c'])
print(s)
# Output:
# a 100
# b 200
# c 300
# dtype: int64
# Example 3 — From dictionary
# Input:
s = pd.Series({'x':5,'y':10,'z':15})
print(s)
# Output:
# x 5
# y 10
# z 15
# dtype: int64
# Example 4 — From scalar (same value for each index)
# Input:
s = pd.Series(7, index=[0,1,2,3])
print(s)
# Output:
# 0 7
# 1 7
# 2 7
# 3 7
# dtype: int64
# Example 5 — From NumPy array
# Input:
arr = np.array([1,2,3,4])
s = pd.Series(arr)
print(s)
# Output:
# 0 1
# 1 2
# 2 3
# 3 4
# dtype: int64
2.2 Accessing Elements — practical examples
Use indexing, slicing, labels and boolean filters.
# Input:
s = pd.Series([5,10,15,20], index=['a','b','c','d'])
# By position
print('s[0] ->', s[0])
# Output: s[0] -> 5
# By label
print(\"s['c'] ->\", s['c'])
# Output: s['c'] -> 15
# Slice
print('s[1:3] ->\\n', s[1:3])
# Output:
# s[1:3] ->
# b 10
# c 15
# dtype: int64
# Boolean filter
print('s[s > 10] ->\\n', s[s > 10])
# Output:
# c 15
# d 20
# dtype: int64
2.3 Series Operations — vectorized & aligned ops
# Input:
s = pd.Series([2,4,6])
# Scalar op
print('s*3 ->', s*3)
# Output: s*3 -> 0 6 1 12 2 18 dtype: int64
# Add two Series (alignment by index)
s1 = pd.Series([1,2,3], index=['a','b','c'])
s2 = pd.Series([10,20,30], index=['b','c','d'])
print('s1 + s2 ->\\n', s1 + s2)
# Output:
# a NaN
# b 12.0
# c 23.0
# d NaN
# dtype: float64
# Use numpy functions
print('np.sqrt(s) ->', np.sqrt(s))
# Output: np.sqrt(s) -> [1.41421356 2. 2.44948974]
3) 🔹 Pandas DataFrame — 2D table
DataFrame = rows (records) + columns (features). Real world datasets mostly DataFrame format.
3.1 Creating DataFrame — 5 Examples (Input → Output)
# Example 1 — From dictionary
# Input:
data = {'Name':['Aman','Ravi','Priya'],'Age':[22,25,24],'Marks':[85,78,92]}
df = pd.DataFrame(data)
print(df)
# Output:
# Name Age Marks
# 0 Aman 22 85
# 1 Ravi 25 78
# 2 Priya 24 92
# Example 2 — From list of dicts
# Input:
data = [{'a':1,'b':2},{'a':3,'b':4}]
df = pd.DataFrame(data)
print(df)
# Output:
# a b
# 0 1 2
# 1 3 4
# Example 3 — From list of lists (with columns)
# Input:
data = [[1,'Aman'],[2,'Ravi']]
df = pd.DataFrame(data, columns=['ID','Name'])
print(df)
# Output:
# ID Name
# 0 1 Aman
# 1 2 Ravi
# Example 4 — From NumPy array
# Input:
arr = np.arange(6).reshape(2,3)
df = pd.DataFrame(arr, columns=['A','B','C'])
print(df)
# Output:
# A B C
# 0 0 1 2
# 1 3 4 5
# Example 5 — From Series
# Input:
s1 = pd.Series([1,2,3])
s2 = pd.Series([4,5,6])
df = pd.DataFrame({'col1':s1,'col2':s2})
print(df)
# Output:
# col1 col2
# 0 1 4
# 1 2 5
# 2 3 6
3.2 Selecting Data — practical cases
# Input:
df = pd.DataFrame({'Name':['A','B','C'],'Age':[20,21,22],'Marks':[80,85,90]})
# Single column
print('df[\"Name\"] ->\\n', df['Name'])
# Output:
# 0 A
# 1 B
# 2 C
# Name: Name, dtype: object
# Multiple columns
print('df[[\"Name\",\"Marks\"]] ->\\n', df[['Name','Marks']])
# Output: table with Name and Marks columns
# Row by label (loc)
print('df.loc[1] ->\\n', df.loc[1])
# Output: Series for row index 1
# Row by position (iloc)
print('df.iloc[2] ->\\n', df.iloc[2])
# Output: Series for third row
# Conditional selection
print('df[df[\"Marks\"] > 80] ->\\n', df[df['Marks'] > 80])
# Output: rows where Marks > 80 (rows 1 and 2)
3.3 Modifying DataFrame — add/update/drop
# Input:
df = pd.DataFrame({'Name':['A','B','C'],'Marks':[70,80,90]})
# Add column
df['Passed'] = df['Marks'] > 75
print('After add ->\\n', df)
# Output: column 'Passed' with True/False
# Update value
df.loc[0,'Marks'] = 75
print('After update ->\\n', df)
# Drop column
df = df.drop('Passed', axis=1)
print('After drop ->\\n', df)
# Add row
df.loc[3] = ['D', 65]
print('After add row ->\\n', df)
# Drop row
df = df.drop(3, axis=0)
print('After drop row ->\\n', df)
4) 🔹 Useful DataFrame Functions (Quick EDA)
Data samajhne के लिए common commands: head(), info(), describe(), value_counts()
.
# Example EDA (Input -> Output)
print(df.head()) # first 5 rows
print(df.shape) # (rows, cols)
print(df.info()) # dtypes & non-null counts
print(df.describe()) # numerical summary
# Output: respectively head table, shape tuple, info text, describe table
5) 🔹 Real Dataset Hands-On — Titanic (practical)
Step-by-step: load CSV → inspect → simple analysis.
# Input:
titanic = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')
print(titanic.head())
# Example 1 — Average Age
print('Average Age ->', titanic['Age'].mean())
# Example 2 — Total Survived
print('Total Survived ->', titanic['Survived'].sum())
# Example 3 — Male vs Female count
print('Sex counts ->\\n', titanic['Sex'].value_counts())
# Example 4 — Top fares
print(titanic.sort_values('Fare', ascending=False)[['Name','Fare']].head())
# Example 5 — Survival rate by Sex
print('Survival Rate by Sex ->\\n', titanic.groupby('Sex')['Survived'].mean())
# Output: See printed tables & numbers in console / notebook
6) 🎯 Homework / Practice
- Series बनाइए — IPL scores — Find max, min और average (input→output).
- DataFrame बनाइए — 10 students (Name, Age, Marks) — Find top 3 marks and average marks.
- Titanic practice: Find minimum age, avg fare by Pclass, survival rate for children (<18).