Day 5: Model Evaluation, Validation, Improvement & Advanced Engineering
Ek dam detail mein—koi external reference ki zaroorat nahi. Hinglish mein explain kiya gaya hai taaki learning mazedaar bhi ho.
1) Model Evaluation & Metrics / मॉडल मूल्यांकन और मापदंड
हिंदी: मॉडल तैयार हो गया है — अब उसकी परख करनी है। Classification में accuracy, precision, recall, F1, ROC-AUC और regression में MSE, MAE, RMSE, R² काम आते हैं।
1.1 Classification Metrics: Accuracy, Precision, Recall, F1
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
y_true = [0,1,1,0,1,0,1,1]
y_pred = [0,1,0,0,1,0,1,0]
print("Accuracy:", accuracy_score(y_true, y_pred))
print("Precision:", precision_score(y_true, y_pred))
print("Recall:", recall_score(y_true, y_pred))
print("F1:", f1_score(y_true, y_pred))
print("Confusion Matrix:\\n", confusion_matrix(y_true, y_pred))
Your Turn:
Try modifying y_pred
to make false positives zyada ya false negatives zyada — observe how precision vs recall changes.
1.2 ROC Curve & AUC
from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt
y_prob = [0.1,0.9,0.2,0.4,0.8,0.05,0.9,0.3]
fpr, tpr, _ = roc_curve(y_true, y_prob)
print("AUC:", roc_auc_score(y_true, y_prob))
plt.plot(fpr, tpr, label="ROC")
plt.plot([0,1],[0,1],'--', label="Random")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.legend()
plt.show()
Practice:
Change one of y_prob
values drastically (e.g., make a correct class have low prob) and observe AUC change.
1.3 Regression Metrics: MAE, MSE, RMSE, R²
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
y_true_reg = [10,12,9,15,20]
y_pred_reg = [11,11,8,14,19]
print("MAE:", mean_absolute_error(y_true_reg, y_pred_reg))
print("MSE:", mean_squared_error(y_true_reg, y_pred_reg))
print("RMSE:", mean_squared_error(y_true_reg, y_pred_reg, squared=False))
print("R²:", r2_score(y_true_reg, y_pred_reg))
Practice:
Introduce one larger error in y_pred_reg
, like change one prediction to 30, and observe how MSE vs MAE react.
2) Model Validation Techniques / मॉडल सत्यापन तकनीक
हिंदी: केवल एक बार train-test split karne se results unstable ho sakte hain. Better: k-fold CV, stratified CV, leave-one-out (बहुत slow), या nested CV use करें।
2.1 K-Fold Cross-Validation
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
import numpy as np
X = np.random.rand(200,5)
y = np.random.randint(0,2,200)
rf = RandomForestClassifier()
scores = cross_val_score(rf, X, y, cv=5)
print("CV Scores:", scores)
print("Mean:", scores.mean())
Your Turn:
Set cv=10
and see if mean score gets more stable (less variance across folds).
2.2 Stratified K-Fold for Imbalanced Classes
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.ensemble import RandomForestClassifier
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
s_scores = cross_val_score(rf, X, y, cv=skf)
print("Stratified CV Mean:", s_scores.mean())
Practice:
Create imbalanced y
(e.g., 90 zeros and 10 ones) and compare results using plain CV vs stratified CV.
2.3 Nested Cross-Validation (for tuning + evaluation)
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
inner_cv = StratifiedKFold(4, shuffle=True, random_state=1)
outer_cv = StratifiedKFold(5, shuffle=True, random_state=2)
param_grid = {'n_estimators':[50,100], 'max_depth':[5,10]}
grid = GridSearchCV(RandomForestClassifier(), param_grid, cv=inner_cv)
nested_scores = cross_val_score(grid, X, y, cv=outer_cv)
print("Nested CV Mean:", nested_scores.mean())
Your Turn:
Compare nested CV result to a simple train-test split with tuning — observe possible over-optimism in single-split tuning.
3) Model Improvement Techniques / मॉडल सुधार विधियाँ
हिंदी: Model को बेहतर बनाने के लिए hyperparameter tuning, ensembles, regularization, feature selection, और overfitting से बचने की तकनीक उपयोग करें।
3.1 Hyperparameter Tuning: Grid vs Random Search
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint
# Grid Search
param_grid = {'n_estimators': [50,100], 'max_depth': [5,10]}
grid = GridSearchCV(RandomForestClassifier(random_state=1), param_grid, cv=4)
grid.fit(X, y)
print("Grid best:", grid.best_params_, grid.best_score_)
# Randomized Search
param_dist = {'n_estimators': randint(50,200), 'max_depth': randint(3,15)}
rand = RandomizedSearchCV(RandomForestClassifier(random_state=1), param_dist, n_iter=5, cv=4, random_state=0)
rand.fit(X, y)
print("Random best:", rand.best_params_, rand.best_score_)
Practice:
Run both searches — note runtime difference if param grid is large. Randomized can be more efficient.
3.2 Ensembles: Voting, Stacking
from sklearn.ensemble import VotingClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()
clf3 = SVC(probability=True)
voter = VotingClassifier([('lr',clf1),('dt',clf2),('svc',clf3)], voting='soft')
print("Voting CV:", cross_val_score(voter, X, y, cv=5).mean())
stack = StackingClassifier([('rf',rf),('svc',clf3)], final_estimator=LogisticRegression())
print("Stacking CV:", cross_val_score(stack, X, y, cv=5).mean())
Practice:
Try adding KNeighborsClassifier to ensemble. Compare single models vs ensemble performance.
3.3 Regularization: L1 vs L2
from sklearn.linear_model import LogisticRegression
lr_l2 = LogisticRegression(penalty='l2', solver='liblinear')
lr_l1 = LogisticRegression(penalty='l1', solver='liblinear')
print("L2 CV:", cross_val_score(lr_l2, X, y, cv=5).mean())
print("L1 CV:", cross_val_score(lr_l1, X, y, cv=5).mean())
Practice:
Change regularization strength (C
) and observe impact on performance and sparsity.
3.4 Feature Selection: Filter vs Wrapper vs Embedded
from sklearn.feature_selection import SelectKBest, f_classif, RFE
sel = SelectKBest(f_classif, k=3)
X_sel = sel.fit_transform(X, y)
print("SelectKBest shape:", X_sel.shape)
rfe = RFE(estimator=LogisticRegression(), n_features_to_select=3)
X_rfe = rfe.fit_transform(X, y)
print("RFE shape:", X_rfe.shape)
Practice:
Train a model on selected features vs full features — compare accuracy and training time.
4) Advanced Feature Engineering / उन्नत फीचर इंजीनियरिंग
4.1 Interaction & Polynomial Features
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(2, include_bias=False)
X_poly = poly.fit_transform(X[:10])
print("Original shape:", X[:10].shape, "Poly shape:", X_poly.shape)
Practice:
Print feature names with poly.get_feature_names_out()
to see how interactions are created.
4.2 Frequency & Target Encoding
import pandas as pd
df_cat = pd.DataFrame({'city':['A','B','A','C','B','A'], 'y':[1,0,1,0,1,0]})
freq = df_cat['city'].value_counts().to_dict()
df_cat['city_freq'] = df_cat['city'].map(freq)
target_mean = df_cat.groupby('city')['y'].mean().to_dict()
df_cat['city_tgt'] = df_cat['city'].map(target_mean)
print(df_cat)
Practice:
Think about leakage — how to avoid it using K-fold within target encoding?
4.3 Binning Continuous Variables
import pandas as pd
vals = [18,22,35,45,60,70]
bins = [0,20,40,60,100]
labels = ['teen','adult','mid','senior']
pd.cut(pd.Series(vals), bins=bins, labels=labels)
Practice:
Bin 'fare' values into categories like 'low','medium','high' and use them as categorical features.
4.4 Feature Importance (Tree & SHAP sketch)
import numpy as np
rf.fit(X, y)
print("Feature importances:", rf.feature_importances_)
# For actual SHAP, install shap and use shap.TreeExplainer
Practice:
Plot feature importances and drop the lowest 2 — retrain and see if score drops.
5) End-to-End Case Study — Churn Prediction Workflow
हिंदी: अब आपको एक पूरा practical workflow मिलेगा: data load → preprocessing → feature engineering → model train → evaluate → tune → final model. सब copy-paste-run होगा।
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score, classification_report
df = pd.read_csv('customer_churn.csv') # make sure CSV present
df = df.dropna(subset=['churn'])
df['age'].fillna(df['age'].median(), inplace=True)
df['balance'].fillna(df['balance'].median(), inplace=True)
df['gender'].fillna('other', inplace=True)
df['region'].fillna('unknown', inplace=True)
X = df.drop(['customer_id','churn'], axis=1)
y = df['churn'].map({'No':0,'Yes':1}) if df['churn'].dtype=='object' else df['churn']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)
num_cols = ['age','balance']
cat_cols = ['gender','region']
preprocessor = ColumnTransformer([
('num', StandardScaler(), num_cols),
('cat', OneHotEncoder(handle_unknown='ignore', sparse=False), cat_cols)
])
pipeline = Pipeline([
('pre', preprocessor),
('clf', RandomForestClassifier(random_state=42))
])
pipeline.fit(X_train, y_train)
pred = pipeline.predict(X_test)
print("Accuracy:", pipeline.score(X_test, y_test))
print(classification_report(y_test, pred))
param_grid = {'clf__n_estimators':[50,100], 'clf__max_depth':[5,10]}
grid = GridSearchCV(pipeline, param_grid, cv=4, scoring='roc_auc')
grid.fit(X_train, y_train)
best = grid.best_estimator_
print("Best params:", grid.best_params_)
final_pred = best.predict(X_test)
print("Final ROC-AUC:", roc_auc_score(y_test, best.predict_proba(X_test)[:,1]))
print(classification_report(y_test, final_pred))
Your Turn as Practice:
Run the code. Then try changing the model (like XGBoost), adding features (like products), or expanding param grid to improve ROC-AUC.
6) Practice Exercises — Copy → Run (with expected results)
हिंदी: ये exercises ready-to-run हैं। Copy → paste → run करिए, बाद में अपना twist डालकर देखें कि परिणाम कैसे बदलते हैं।
6.1 Evaluate a classifier: change y_pred to increase FP
... (as above)
6.2 Plot ROC curve with your own probabilities
... (as above)
6.3 Regression metrics with one outlier
... (as above)
6.4 Perform stratified 5-fold CV on a synthetic dataset
... (as above)
6.5 Ensemble Voting vs Stacking comparison
... (as above)
Resources & Next Steps
- Download sample datasets (Titanic, Churn) from Kaggle
- Scikit-learn documentation on metrics, model selection, pipelines
- SHAP for advanced explainability
- YouTube tutorials for ensemble methods & tuning