A collection of EDA projects performed on multiple datasets.
Each project includes data cleaning, preprocessing, visualization, feature engineering, and insights.
Contains chemical properties of red wine such as acidity, sulphates, alcohol, and a quality score (0–10).
Used to identify which features influence wine quality.
Includes airline, route, duration, date of journey, total stops, and ticket price.
Used to understand factors affecting airfare.
Contains demographic information, parental education, exam preparation, and exam scores (math, reading, writing).
Used to analyze patterns in student achievement.
Includes app details such as category, rating, reviews, installs, size, price, and content rating.
Used to study app trends and user engagement.
- Cleaned dataset and checked missing values
- Analyzed chemical feature distributions
- Visualized correlations using heatmap and scatter plots
- Insight: Higher alcohol increases quality; high volatile acidity reduces quality
- Extracted journey and time-based features
- Applied Label and One-Hot Encoding
- Visualized price variation by airline, route, duration
- Insight: Airline type and number of stops majorly affect ticket price
- Checked score distributions across subjects
- Compared performance based on gender, race, parental education
- Created histograms, boxplots, and pairplots
- Insight: Test preparation and parental education improve scores
- Cleaned install count, reviews, size, and price fields
- Engineered new features: size groups, install buckets, price groups
- Visualized category distribution, rating trends, install patterns
1. Which category has the largest number of installations?
Games and Communication categories have the highest installs.
2. What are the top 5 most installed apps in each major category?
Extracted top apps per category (e.g., WhatsApp, Messenger, Subway Surfers, etc.).
3. How many apps have a perfect 5 rating?
Only a small number of apps achieve a perfect 5.0 rating.
- other details in the file.
- For check code of End to End project of this Dataset for predict FWI check forest-fire repo
- Data Cleaning & Preprocessing
- Missing Value Handling
- Feature Engineering
- Label & One-Hot Encoding
- Data Visualization (Matplotlib, Seaborn)
- Grouping, Aggregation & Insight Extraction
- Exploratory Data Narratives