GitHub - lava-v/retail-missing-value-analysis: This project showcases practical data analysis skills, including data cleaning on data with missing values, EDA, SQL querying, and dashboard creation.

Retail Product Data Analysis

Project Overview: This project focuses on exploratory data analysis (EDA) and business insights generation using a retail product dataset containing missing values. It focuses on identifying, analyzing, and handling missing data in a retail product dataset. Various imputation techniques were applied based on data distribution and business reasoning rather than blanket filling. It demonstrates end-to-end data analysis skills using Python, SQL, and Power BI.

Dataset: Retail Product Dataset with Missing Values

Records:4361 entries distributed in 5 columns category, price, rating, discount, stock
The dataset intentionally includes missing values to simulate real-world data challenges.
Source: Kaggle - https://www.kaggle.com/datasets/himelsarder/retail-product-dataset-with-missing-values

Data Cleaning & Preprocessing: The following steps were performed using Python (Pandas):

Identified missing values in numerical and categorical columns
Imputed: Numerical columns using median values Categorical columns using mode values Removed duplicate records Verified data types and corrected inconsistencies Created derived fields where necessary for analysis
Exploratory Data Analysis (EDA) Numerical Analysis: Distribution analysis using histograms with KDE; Outlier detection using boxplots. Categorical Analysis: Category-wise product count; Stock availability comparison (In Stock vs Out of Stock); Category-level price comparison using boxplots; Correlation Analysis - Pearson correlation between: Price & Discount, Price & Rating, Discount & Rating.
Result: All correlations are close to zero, indicating no strong linear relationship between pricing, discounts, and ratings.

SQL Analysis SQL was used to answer business-oriented questions such as: Total number of products

Average price, discount, and rating
Products with high discounts (≥ 40%)
Category-wise pricing trends
Stock availability breakdown
The SQL queries are consolidated and included for easy review and reproducibility.

Power BI Dashboard An interactive Power BI dashboard was created to visualize:

Price and discount distributions (histograms)
Category-wise price comparison
Stock availability
Correlation summary using DAX measures

Key Highlights:

Custom DAX measures for correlation
Clean layout with business-focused KPIs
Designed for stakeholder-friendly interpretation

Tools & Technologies used:

Python: Pandas, NumPy, Matplotlib, Seaborn
SQL: SQLite
Power BI: DAX, interactive dashboards
Version Control: Git & GitHub

Key Insights

Most products fall within a mid-price range with moderate discounts
Category C dominates the product count
High discounts do not necessarily correspond to higher ratings
Price, discount, and rating operate largely independently in this dataset

Conclusion This project showcases practical data analysis skills, including data cleaning, EDA, SQL querying, and dashboard creation. It reflects real-world scenarios where data is imperfect and insights must be derived through structured analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Data		Data
Images		Images
Notebooks		Notebooks
PowerBi		PowerBi
SQL		SQL
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages