60-Day Roadmap — Python for Machine Learning & Data Science
Day-wise plan with Hinglish explanations, runnable examples, and practice boxes. Follow daily and build real projects by Day 60.
Phase 1 — Python & Data Basics (Day 1–10)
Day 1 — Python Basics (Variables, Types, Operators)
Explain (Hinglish): Python ke basic building blocks: numbers, strings, booleans, variables. Ye important hai because data representation yahi se shuru hoti hai.
Examples
# Example 1 — variables & types
a = 10
b = 3.14
c = "BeepShip"
print(type(a), type(b), type(c))
# Example 2 — string formatting
name = "Amit"
print(f"Hello {name}, welcome to ML!")
# Example 3 — boolean & comparison
x = 5
print(x > 3, x == 5)
Day 2 — Control Flow & Functions (if, loops, def)
Explain: Conditional statements aur loops se aap data iterate kar sakte ho. Functions se code reusable banta hai.
# Example 1 — if-else
n=7
if n%2==0:
print("Even")
else:
print("Odd")
# Example 2 — for loop
for i in range(5):
print(i)
# Example 3 — function
def add(a,b):
return a+b
print(add(3,4))
Day 3 — Python Collections (List, Tuple, Set, Dict)
Explain: In collections ka use data ko store/manage karne ke liye hota hai. Pandas underlyingly in concepts se inspired hai.
# Example 1 — list ops
lst = [1,2,3]
lst.append(4)
print(lst)
# Example 2 — dict
d = {"name":"Asha","age":25}
print(d["name"])
# Example 3 — set for unique
s = set([1,2,2,3])
print(s)
Day 4 — File I/O & Error Handling
Explain: Data aksar files mein aata hai (CSV, JSON). File read/write aur exceptions handle karna zaruri hai.
# Example 1 — read/write text
with open("sample.txt","w") as f:
f.write("Hello BeepShip\n")
with open("sample.txt") as f:
print(f.read())
# Example 2 — read JSON
import json
obj = {"a":1}
with open("data.json","w") as f: json.dump(obj,f)
with open("data.json") as f: print(json.load(f))
# Example 3 — try/except
try:
1/0
except ZeroDivisionError:
print("Cannot divide by zero")
dict.get()
.
Day 5 — NumPy Basics (arrays, vector ops)
Explain: Numerical computing ke liye NumPy must-know. Vectorization se loops slow nahin chalte aur code readable banta hai.
# Example 1 — create arrays
import numpy as np
a = np.array([1,2,3])
print(a.shape)
# Example 2 — vector ops
b = np.array([4,5,6])
print(a + b, a * 2)
# Example 3 — broadcasting
M = np.ones((3,3))
v = np.array([1,2,3])
print(M + v)
Day 6 — NumPy Advanced (matrix ops, slicing, broadcasting)
Explain: Matrices, linear algebra operations, and efficient indexing. Important for ML algorithms internals.
# Example 1 — dot product
A = np.array([[1,2],[3,4]])
b = np.array([5,6])
print(A.dot(b))
# Example 2 — slicing
print(A[0,:], A[:,1])
# Example 3 — inverse (if invertible)
print(np.linalg.inv(A))
np.linalg.solve
for random A, b.
Day 7 — Pandas: Series & DataFrame basics
Explain: Pandas se CSV read/clean/transform hota hai. DataFrame is the core object.
# Example 1 — read CSV
import pandas as pd
df = pd.read_csv("sample.csv") # replace with local CSV
print(df.head())
# Example 2 — access columns
print(df['col1'].mean())
# Example 3 — filtering
print(df[df['age'] > 30])
Day 8 — Pandas: Cleaning (missing data, duplicates)
Explain: Real-world data often messy — missing values, duplicates. Pehle clean karna model ki quality improve karta hai.
# Example 1 — dropna / fillna
df['age'] = df['age'].fillna(df['age'].median())
# Example 2 — drop duplicates
df = df.drop_duplicates()
# Example 3 — detect missing
print(df.isna().sum())
Day 9 — Pandas: GroupBy, Merge, Pivot
Explain: Aggregations and joins — essential for feature engineering and building aggregate features.
# Example 1 — groupby
print(df.groupby('pclass')['fare'].mean())
# Example 2 — merge
orders = pd.DataFrame({'order_id':[1,2],'cust':[10,11]})
cust = pd.DataFrame({'cust':[10,11],'name':['A','B']})
print(orders.merge(cust, on='cust'))
# Example 3 — pivot_table
print(df.pivot_table(values='fare', index='sex', columns='pclass', aggfunc='mean'))
Day 10 — Data Visualization: Matplotlib & Seaborn
Explain: Visuals help understanding distributions, correlations and spotting outliers.
# Example 1 — histogram
import matplotlib.pyplot as plt
plt.hist(df['age'].dropna(), bins=20)
plt.show()
# Example 2 — seaborn pairplot
import seaborn as sns
sns.pairplot(df.dropna(), hue='survived', vars=['age','fare'])
plt.show()
# Example 3 — heatmap correlation
sns.heatmap(df.corr(), annot=True)
plt.show()
Phase 2 — Statistics & Preprocessing (Day 11–20)
Day 11 — Stats basics: mean, median, variance, skewness
# Example 1
import numpy as np
arr = np.array([1,2,3,4,100])
print(arr.mean(), np.median(arr), arr.var(), np.std(arr))
# Example 2 — skew (scipy)
from scipy.stats import skew
print("Skew:", skew(arr))
# Example 3 — describe in pandas
print(pd.Series(arr).describe())
Day 12 — Probability basics (distribution types)
# Example 1 — normal distribution sample
import numpy as np
x = np.random.normal(0,1,1000)
# Example 2 — plot histogram
import matplotlib.pyplot as plt
plt.hist(x, bins=30); plt.show()
# Example 3 — sample from binomial
y = np.random.binomial(n=10,p=0.3,size=1000)
Day 13 — Visualization advanced (pairplot, jointplot, heatmap)
# Use seaborn pairplot/jointplot and heatmap; examples similar to Day 10
import seaborn as sns
sns.jointplot(x='age', y='fare', data=df, kind='hex')
plt.show()
Day 14 — Handling Missing Values (drop, fill, impute)
# Example 1 — dropna
df_drop = df.dropna(subset=['age','fare'])
# Example 2 — fill with mean
df['age_fill'] = df['age'].fillna(df['age'].mean())
# Example 3 — IterativeImputer (sklearn)
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
imp = IterativeImputer(random_state=0)
num_cols = ['age','fare']
df[num_cols] = imp.fit_transform(df[num_cols])
Day 15 — Feature scaling: StandardScaler, MinMax, Robust
# Example 1 — StandardScaler
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
sc = StandardScaler()
print(sc.fit_transform(df[['fare']].fillna(0))[:5])
# Example 2 — MinMax
mm = MinMaxScaler()
print(mm.fit_transform(df[['age']].fillna(0))[:5])
Day 16 — Encoding categories: OneHot, Ordinal, Target encoding
# Example 1 — OneHot with pandas
onehot = pd.get_dummies(df['sex'], prefix='sex')
# Example 2 — OrdinalEncoder
from sklearn.preprocessing import OrdinalEncoder
ord = OrdinalEncoder()
df['class_ord'] = ord.fit_transform(df[['pclass']])
# Example 3 — category_encoders target encoding (sketch)
# import category_encoders as ce
# te = ce.TargetEncoder(cols=['embarked'])
Day 17 — Outlier detection (IQR, Z-score)
# Example 1 — IQR method
Q1 = df['fare'].quantile(0.25)
Q3 = df['fare'].quantile(0.75)
IQR = Q3 - Q1
out = df[(df['fare'] < Q1 - 1.5*IQR) | (df['fare'] > Q3 + 1.5*IQR)]
print(out.shape)
# Example 2 — zscore
from scipy.stats import zscore
df['fare_z'] = zscore(df['fare'].fillna(df['fare'].mean()))
Day 18 — Train/Test best practices & stratify
# Example 1 — basic split
from sklearn.model_selection import train_test_split
X = df[['age','fare']].fillna(0)
y = df['survived'].fillna(0)
Xtr,Xte,ytr,yte = train_test_split(X,y,test_size=0.2, random_state=42, stratify=y)
Day 19 — Pipelines & ColumnTransformer
# Example 1 — ColumnTransformer & Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
num_cols = ['age','fare']; cat_cols=['sex']
pre = ColumnTransformer([('num', StandardScaler(), num_cols), ('cat', OneHotEncoder(), cat_cols)])
pipe = Pipeline([('pre', pre), ('clf', LogisticRegression(max_iter=400))])
pipe.fit(Xtr, ytr)
print("Score:", pipe.score(Xte,yte))
Day 20 — Practice Day: Titanic preprocessing mini-project
Use the full preprocessing pipeline: missing values, encoding, scaling, feature creation (title from name), and save final cleaned CSV.
# Sketch steps (do in notebook)
# 1. load titanic
# 2. extract title from name
# 3. impute age, fill embarked
# 4. encode sex, embarked
# 5. scale fare
# 6. save cleaned CSV
Phase 3 — Supervised Learning (Day 21–35)
Day 21 — ML overview & problem framing
Explain: Supervised vs unsupervised, regression vs classification, evaluation metrics mapping.
# No heavy code — conceptual mapping table and examples
# Examples: predicting price = regression, churn (yes/no) = classification
Day 22 — Linear Regression (theory + code)
# Example 1 — simple linear regression (sklearn)
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
lr = LinearRegression().fit(X, y)
print("R2:", lr.score(X,y))
Day 23 — Multiple & Polynomial Regression
# Example 1 — polynomial features
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(2)
X2 = poly.fit_transform(X[:50])
print(X2.shape)
Day 24 — Logistic Regression & classification basics
# Example 1 — logistic on iris (binary: setosa vs not)
from sklearn.linear_model import LogisticRegression
import numpy as np
y_bin = (load_iris().target==0).astype(int)
X = load_iris().data
lr = LogisticRegression(max_iter=500).fit(X, y_bin)
print("Acc:", lr.score(X,y_bin))
predict_proba
and compute ROC AUC.Day 25 — K-Nearest Neighbors (KNN)
# Example 1 — KNN classification
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X,y_bin)
print("Acc:", knn.score(X,y_bin))
Day 26 — Decision Trees
# Example 1 — decision tree
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier(max_depth=4).fit(X,y_bin)
print("Tree acc:", dt.score(X,y_bin))
Day 27 — Random Forest & feature importance
# Example 1 — random forest
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100).fit(X,y_bin)
print("RF acc:", rf.score(X,y_bin))
print("Feat importances:", rf.feature_importances_)
Day 28 — SVM (linear & RBF)
# Example 1 — SVM RBF
from sklearn.svm import SVC
svc = SVC(kernel='rbf', probability=True).fit(X,y_bin)
print("SVM acc:", svc.score(X,y_bin))
Day 29 — Naive Bayes
# Example 1 — GaussianNB
from sklearn.naive_bayes import GaussianNB
nb = GaussianNB().fit(X,y_bin)
print("NB acc:", nb.score(X,y_bin))
Day 30 — Practice Day: Compare 5 classifiers on Titanic
# Sketch: train Logistic, KNN, RF, SVM, NB on preprocessed Titanic and compare metrics (accuracy, f1)
# Use cross_val_score and classification_report for final evaluation
Day 31 — Model Evaluation Metrics (accuracy, precision, recall, F1)
# Example 1 — compute metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
y_true = [0,1,1,0,1]; y_pred=[0,1,0,0,1]
print("Acc", accuracy_score(y_true,y_pred))
print("Prec", precision_score(y_true,y_pred))
Day 32 — ROC, AUC, Precision-Recall curves
# Example 1 — ROC & AUC
from sklearn.metrics import roc_curve, roc_auc_score
y_prob = [0.1,0.9,0.2,0.8]
print("AUC:", roc_auc_score([0,1,0,1], y_prob))
Day 33 — Cross-validation (k-fold, stratified)
# Example 1 — k-fold cross_val_score
from sklearn.model_selection import cross_val_score
print(cross_val_score(rf, X, y_bin, cv=5).mean())
Day 34 — Hyperparameter tuning: GridSearchCV & RandomizedSearch
# Example 1 — GridSearchCV sketch
from sklearn.model_selection import GridSearchCV
param_grid = {'n_estimators':[50,100], 'max_depth':[5,10]}
grid = GridSearchCV(RandomForestClassifier(), param_grid, cv=3)
grid.fit(X,y_bin)
print("Best:", grid.best_params_)
Day 35 — Practice Day: Hyperparameter tuning on RandomForest
# Practical steps:
# 1. Preprocess dataset
# 2. Define param grid
# 3. Run GridSearchCV / RandomizedSearchCV
# 4. Evaluate best model on holdout
Phase 4 — Unsupervised Learning & Feature Engineering (Day 36–45)
Day 36 — KMeans Clustering & elbow method
# Example 1 — kmeans basics
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
X, _ = make_blobs(n_samples=300, centers=4, random_state=42)
km = KMeans(n_clusters=4).fit(X)
print("Labels sample:", km.labels_[:10])
# elbow: inertia over k
inertia=[]
for k in range(1,8):
inertia.append(KMeans(n_clusters=k,random_state=42).fit(X).inertia_)
print(inertia)
Day 37 — Hierarchical Clustering & dendrograms
# Example 1 — linkage & dendrogram
from scipy.cluster.hierarchy import linkage, dendrogram
Z = linkage(X[:50], method='ward')
import matplotlib.pyplot as plt
dendrogram(Z)
plt.show()
Day 38 — DBSCAN (density-based)
# Example 1 — DBSCAN
from sklearn.cluster import DBSCAN
db = DBSCAN(eps=0.5, min_samples=5).fit(X)
print("Unique labels:", set(db.labels_))
Day 39 — PCA for dimensionality reduction
# Example 1 — PCA
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
Z = pca.fit_transform(X)
print("Explained variance:", pca.explained_variance_ratio_)
Day 40 — t-SNE & LDA intro
# Example 1 — t-SNE sketch (compute heavy so small sample)
from sklearn.manifold import TSNE
z = TSNE(n_components=2, random_state=42).fit_transform(X[:200])
Day 41 — Feature Engineering: creating ratios, dates, interactions
# Example 1 — ratio features
df["income_per_person"] = df["income"] / (df["household_size"]+1)
# Example 2 — date parts
df['signup_date'] = pd.to_datetime(df['signup_date'])
df['signup_month'] = df['signup_date'].dt.month
# Example 3 — interaction
df['age_income'] = df['age'] * df['income']
Day 42 — Feature Selection: filter, wrapper, embedded
# Example 1 — SelectKBest
from sklearn.feature_selection import SelectKBest, f_classif
sel = SelectKBest(f_classif, k=5).fit(X, y_bin)
print("Selected idx:", sel.get_support())
# Example 2 — RFE
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
rfe = RFE(LogisticRegression(max_iter=200), n_features_to_select=5).fit(X,y_bin)
print(rfe.get_support())
Day 43 — Dimensionality Reduction practice (e.g., on MNIST subset)
# Example 1 — PCA on sklearn digits
from sklearn.datasets import load_digits
d = load_digits()
pca = PCA(30)
Xp = pca.fit_transform(d.data)
print("Shape after PCA:", Xp.shape)
Day 44 — Clustering practice (customer segmentation)
# Example steps:
# 1. aggregate customer metrics
# 2. scale features
# 3. KMeans with elbow/silhouette
# 4. interpret clusters (avg spend, frequency)
Day 45 — Practice Day: Feature engineering on sales dataset
# Steps:
# Create time-based features, lag features, aggregated customer features.
# Train baseline model and record metrics.
Phase 5 — Advanced ML & Deployment (Day 46–55)
Day 46 — Ensemble learning overview (why ensembles work)
# Theory + small demo: avg of predictions reduces variance
import numpy as np
preds = [np.random.rand(100) for _ in range(5)]
ensemble = np.mean(preds, axis=0)
Day 47 — Bagging & Random Forest recap
# RandomForest done earlier — revisit OOB and feature importance
rf = RandomForestClassifier(n_estimators=200, oob_score=True).fit(X,y_bin)
print("OOB score:", rf.oob_score_)
Day 48 — Boosting: AdaBoost, Gradient Boosting
# Example 1 — AdaBoost sketch
from sklearn.ensemble import AdaBoostClassifier
adb = AdaBoostClassifier(n_estimators=100).fit(X,y_bin)
print("AdaBoost acc:", adb.score(X,y_bin))
Day 49 — XGBoost & LightGBM quickstart
# Example 1 — XGBoost (sketch, install xgboost)
# import xgboost as xgb
# model = xgb.XGBClassifier(n_estimators=100).fit(X,y)
Day 50 — Stacking & blending
# Example 1 — StackingClassifier sketch
from sklearn.ensemble import StackingClassifier
stack = StackingClassifier(estimators=[('rf',rf),('svc',svc)], final_estimator=LogisticRegression())
print(cross_val_score(stack, X, y_bin, cv=5).mean())
Day 51 — Deployment refresher: FastAPI, Flask
# See Day 1-7 posts for full apps; here test for scaling
# Quick note: prefer FastAPI for production due to ASGI performance.
Day 52 — Streamlit for ML demos
# Example 1 — simple Streamlit app (sketch)
# import streamlit as st
# st.title("ML Demo")
# val = st.slider("Sepal length", 0.0, 10.0)
# st.write(predict_function(val))
Day 53 — MLOps basics: experiment tracking & model registry
# Example 1 — MLflow sketch
# import mlflow
# mlflow.start_run()
# mlflow.log_param("n_estimators",100)
# mlflow.log_metric("roc_auc", 0.93)
# mlflow.sklearn.log_model(pipe, "model")
Day 54 — CI/CD for ML pipelines
# Example 1 — GitHub Actions pipeline sketch included earlier
# Focus: tests, build, publish container, deploy to staging
Day 55 — Practice Day: Deploy model on Streamlit + Docker
Phase 6 — Advanced Topics & Capstone (Day 56–60)
Day 56 — Interpretability: SHAP & LIME
# Example 1 — SHAP (sketch)
# pip install shap
# import shap
# explainer = shap.TreeExplainer(rf)
# shap_values = explainer.shap_values(X_test)
# shap.summary_plot(shap_values, X_test)
Day 57 — Fairness & bias mitigation
# Example 1 — simple fairness check
# Compute metrics per sensitive group (TPR, FPR)
# Consider reweighing or threshold tuning if parity needed
Day 58 — Recommender Systems recap & practice
# Quick recap: popularity, content-based (TF-IDF), collaborative (SVD)
# Example: compute TF-IDF over item descriptions and cosine similarity
Day 59 — Advanced Time Series notes & evaluation
# Advanced: Prophet, SARIMAX, exogenous variables
# Evaluate with rolling-origin (walk-forward) cross validation
Day 60 — Capstone: End-to-End Project
Task: Build one full project from data → EDA → preprocessing → feature engineering → model → tuning → deploy (Streamlit/FastAPI) → monitoring sketch. Examples: Churn prediction, Sales forecast + recommender for products.
# Capstone checklist (example)
# 1. Problem statement & data
# 2. EDA & visuals
# 3. Preprocessing pipeline (impute, encode, scale)
# 4. Feature engineering (date, aggregation)
# 5. Train multiple models; cross-validate
# 6. Select model; hyperparameter tune
# 7. Save artifacts (model, preprocessors)
# 8. Build demo app (Streamlit/FastAPI)
# 9. Dockerize app
# 10. Write short report + README + notebook
Resources, Tips & Next Steps
- Books: Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow; Python for Data Analysis (Wes McKinney).
- Courses: Fast AI, Coursera ML by Andrew Ng (concepts), Kaggle micro-courses (Pandas, Machine Learning).
- Tools: Jupyter/Colab, VS Code, Docker, GitHub, MLflow/Weights & Biases.
- Practice: Kaggle competitions, open datasets, freelance mini-projects.
Study tips: Practice daily, keep small deliverables, document everything. Har week ek mini project aur har month ek bigger project complete karein.