Skip to content

lava-v/retail-missing-value-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retail Product Data Analysis

Project Overview: This project focuses on exploratory data analysis (EDA) and business insights generation using a retail product dataset containing missing values. It focuses on identifying, analyzing, and handling missing data in a retail product dataset. Various imputation techniques were applied based on data distribution and business reasoning rather than blanket filling. It demonstrates end-to-end data analysis skills using Python, SQL, and Power BI.

Dataset: Retail Product Dataset with Missing Values

Data Cleaning & Preprocessing: The following steps were performed using Python (Pandas):

  • Identified missing values in numerical and categorical columns

  • Imputed: Numerical columns using median values Categorical columns using mode values Removed duplicate records Verified data types and corrected inconsistencies Created derived fields where necessary for analysis

  • Exploratory Data Analysis (EDA) Numerical Analysis: Distribution analysis using histograms with KDE; Outlier detection using boxplots. Categorical Analysis: Category-wise product count; Stock availability comparison (In Stock vs Out of Stock); Category-level price comparison using boxplots; Correlation Analysis - Pearson correlation between: Price & Discount, Price & Rating, Discount & Rating.

  • Result: All correlations are close to zero, indicating no strong linear relationship between pricing, discounts, and ratings.

SQL Analysis SQL was used to answer business-oriented questions such as: Total number of products

  • Average price, discount, and rating
  • Products with high discounts (≥ 40%)
  • Category-wise pricing trends
  • Stock availability breakdown
  • The SQL queries are consolidated and included for easy review and reproducibility.

Power BI Dashboard An interactive Power BI dashboard was created to visualize:

  • Price and discount distributions (histograms)
  • Category-wise price comparison
  • Stock availability
  • Correlation summary using DAX measures

Key Highlights:

  • Custom DAX measures for correlation
  • Clean layout with business-focused KPIs
  • Designed for stakeholder-friendly interpretation

Tools & Technologies used:

  • Python: Pandas, NumPy, Matplotlib, Seaborn
  • SQL: SQLite
  • Power BI: DAX, interactive dashboards
  • Version Control: Git & GitHub

Key Insights

  • Most products fall within a mid-price range with moderate discounts
  • Category C dominates the product count
  • High discounts do not necessarily correspond to higher ratings
  • Price, discount, and rating operate largely independently in this dataset

Conclusion This project showcases practical data analysis skills, including data cleaning, EDA, SQL querying, and dashboard creation. It reflects real-world scenarios where data is imperfect and insights must be derived through structured analysis.

About

This project showcases practical data analysis skills, including data cleaning on data with missing values, EDA, SQL querying, and dashboard creation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors