Python for Machine Learning & Data Science: Day 2 Complete Practice Guide

Python for Machine Learning & Data Science: Day 2 Practice Guide

Welcome to Day 2 of our Python for Machine Learning & Data Science series. If you have completed Day 1, you already know Python basics, NumPy, and your first ML model. Today, we’ll move ahead with data visualization, advanced Pandas, Seaborn, and building your second ML project.

📌 What You Will Learn Today

Advanced Pandas for data analysis
Data visualization with Matplotlib & Seaborn
Working with real-world datasets
Exploratory Data Analysis (EDA)
Building a classification model
Beginner tips to avoid mistakes

🔹 Setting Up the Environment

Just like Day 1, make sure you have Python 3.7+ installed and the following libraries:


1 pip install pandas numpy matplotlib seaborn scikit-learn

Pythone for machine learning and data science

🔹 Advanced Pandas for Data Science

Pandas is not just about loading CSV files. Today, we’ll learn grouping, merging, and handling missing values.


1 import pandas as pd
2 
3 # Load dataset
4 data = pd.read_csv("students_performance.csv")
5 
6 # Show first 5 rows
7 print(data.head())
8 
9 # Handle missing values
10 data.fillna(data.mean(), inplace=True)

🔹 Data Visualization with Matplotlib & Seaborn

Visualization helps us understand data patterns. Seaborn makes it simple and attractive.


1 import matplotlib.pyplot as plt
2 import seaborn as sns
3 
4 # Histogram
5 sns.histplot(data['math score'], bins=10, kde=True)
6 plt.show()
7 
8 # Correlation heatmap
9 sns.heatmap(data.corr(), annot=True, cmap="coolwarm")
10 plt.show()

🔹 Beginner Tip

Always visualize your data before applying ML models. This avoids wrong assumptions and increases accuracy.

🔹 Building Your Second ML Project

Let’s build a classification model to predict whether a student passes or fails based on scores.


1 from sklearn.model_selection import train_test_split
2 from sklearn.linear_model import LogisticRegression
3 from sklearn.metrics import accuracy_score
4 
5 # Convert target variable (Pass/Fail)
6 data['result'] = data['math score'].apply(lambda x: 1 if x >= 40 else 0)
7 
8 X = data[['reading score', 'writing score']]
9 y = data['result']
10 
11 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
12 
13 model = LogisticRegression()
14 model.fit(X_train, y_train)
15 
16 predictions = model.predict(X_test)
17 print("Accuracy:", accuracy_score(y_test, predictions))

🔹 Conclusion

Today we covered advanced Pandas, Seaborn visualization, and built our second ML project. Practice these steps and try applying them on different datasets like Titanic or Iris dataset.