Skip to content

Sukesh1985/car-price-analysis

Repository files navigation

πŸš— Car Price Data Analysis - Python Programming Assignment

Python Pandas License

Comprehensive data analysis of 558,837 used car listings using Python and Pandas

Author: Sukesh Singla
Course: Hero Vired - Python Programming
Date: November 2025
Status: βœ… Complete


πŸ“‹ Table of Contents


🎯 Overview

This project performs comprehensive exploratory data analysis (EDA) on used car listings data to extract actionable business insights, identify pricing patterns, and understand market dynamics in the automobile sector.

Objectives

  • Develop hands-on proficiency in data analysis using Pandas
  • Perform essential data wrangling: cleaning, filtering, grouping, and summarizing
  • Extract meaningful business insights through statistical analysis
  • Create professional visualizations with Matplotlib and Seaborn
  • Apply foundational techniques for exploratory data analysis

πŸ“Š Dataset

Source: Used Car Listings Dataset
Records: 558,837 car listings
Features: 16 attributes
Time Period: 1982-2015

Features Include:

  • year - Model year
  • make - Car manufacturer/brand
  • model - Car model
  • condition - Condition score (0-50)
  • odometer - Mileage reading
  • sellingprice - Actual selling price
  • color - Exterior color
  • state - State where sold
  • And more...

πŸ” Key Findings

Market Insights

  • Average Price: $13,611.33
  • Price Range: $1 - $230,000
  • Most Popular Model: Nissan Altima (29,748 listings - 5.3% market share)
  • Premium Brands: Rolls-Royce ($153K avg), Ferrari ($127K avg), Lamborghini ($113K avg)

Price Drivers

  1. Condition Score - Strongest predictor (+$100 per point)
  2. Odometer Reading - Clear negative correlation (-$100 per 1K miles over 50K)
  3. Model Year - Newer vehicles command 20-30% premium
  4. Color - Neutral colors (white/black/gray) earn 10-15% more

Data Quality

  • Completeness: 98.4% after cleaning
  • Duplicates: 0 found
  • Missing Values: Handled appropriately (11.69% in transmission column)

πŸ“ Project Structure

car-price-analysis/
β”‚
β”œβ”€β”€ README.md                          # Project documentation
β”œβ”€β”€ requirements.txt                   # Python dependencies
β”œβ”€β”€ .gitignore                        # Git ignore rules
β”œβ”€β”€ LICENSE                           # MIT License
β”‚
β”œβ”€β”€ car_price_analysis.py             # Main analysis script
β”‚
β”œβ”€β”€ notebooks/                        # Jupyter notebooks
β”‚   β”œβ”€β”€ Car_Price_Analysis_COMPLETE.ipynb
β”‚   └── SIMPLE_JUPYTER_CELLS.py
β”‚
β”œβ”€β”€ visualizations/                   # Generated charts (10 PNG files)
β”‚   β”œβ”€β”€ 1_missing_values_bar.png
β”‚   β”œβ”€β”€ 2_missing_values_heatmap.png
β”‚   β”œβ”€β”€ 3_correlation_matrix.png
β”‚   β”œβ”€β”€ 4_price_by_year.png
β”‚   β”œβ”€β”€ 5_price_by_odometer.png
β”‚   β”œβ”€β”€ 6_cars_by_state.png
β”‚   β”œβ”€β”€ 7_price_by_condition_ranges.png
β”‚   β”œβ”€β”€ 8_cars_by_condition_ranges.png
β”‚   β”œβ”€β”€ 9_price_by_color_with_outliers.png
β”‚   └── 10_price_by_color_without_outliers.png
β”‚
└── docs/                             # Documentation
    β”œβ”€β”€ COMPREHENSIVE_ANALYSIS_REPORT.md
    β”œβ”€β”€ ASSIGNMENT_COMPLETION_SUMMARY.md
    └── JUPYTER_NOTEBOOK_GUIDE.md

πŸš€ Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Setup

  1. Clone the repository
git clone https://github.com/YOUR_USERNAME/car-price-analysis.git
cd car-price-analysis
  1. Install dependencies
pip install -r requirements.txt
  1. Download the dataset
  • Place your car_prices.csv file in the project root
  • Or use the cleaned version: car_prices_cleaned.csv

πŸ’» Usage

Option 1: Run the Python Script

python car_price_analysis.py

This will:

  • Load and clean the dataset
  • Perform all analyses
  • Generate 10 visualizations
  • Save results to outputs/ folder

Option 2: Use Jupyter Notebook

jupyter notebook notebooks/Car_Price_Analysis_COMPLETE.ipynb

This provides:

  • Interactive cell-by-cell execution
  • Inline visualization display
  • Rich markdown documentation
  • Easy experimentation

Option 3: Step-by-Step Cells

For beginners, use SIMPLE_JUPYTER_CELLS.py:

  • Copy each cell section into a new Jupyter cell
  • Run cells individually with Shift+Enter
  • See immediate results and visualizations

πŸ“Š Visualizations

1. Data Quality Assessment

2. Statistical Analysis

3. Trend Analysis

4. Market Distribution

5. Condition Analysis

6. Color Impact Analysis


πŸ› οΈ Technologies

Core Libraries

  • pandas - Data manipulation and analysis
  • numpy - Numerical computing
  • matplotlib - Data visualization
  • seaborn - Statistical graphics
  • scipy - Scientific computing

Development Tools

  • Jupyter Notebook - Interactive development
  • Python 3.12 - Programming language

Skills Demonstrated

  • βœ… Data cleaning and preprocessing
  • βœ… Exploratory Data Analysis (EDA)
  • βœ… Statistical analysis
  • βœ… Data visualization
  • βœ… Business intelligence
  • βœ… Python programming
  • βœ… Documentation

πŸ“ Assignment Tasks

βœ… Task 1: Data Ingestion & Quality Profiling (100%)

  • 1.1 Load & Inspect - Display first 5 rows, data types, record count
  • 1.2 Understanding Data Structure - Shape, columns, data types
  • 1.3 Missing & Anomaly Detection - Quantify nulls, visualize, resolve

βœ… Task 2: Data Frames Queries (100%)

  • 2.1 Calculate average, minimum, and maximum car price
  • 2.2 List all unique colors
  • 2.3 Find number of unique brands and models
  • 2.4 Find cars with selling price > $165,000
  • 2.5 Find top 5 most frequently sold models
  • 2.6 Average selling price by brand
  • 2.7 Minimum selling price by interior
  • 2.8 Highest odometer reading per year
  • 2.9 Create car age column
  • 2.10 Filter cars by condition and odometer criteria
  • 2.11 Analyze state pricing for newer cars
  • 2.12 Value for money analysis

βœ… Task 3: Data Visualization and Insights (100%)

  • 3.1 Correlation matrix of numerical features
  • 3.2 Average selling price by year
  • 3.3 Average selling price by odometer
  • 3.4 Number of cars sold by state
  • 3.5 Price by condition score ranges (size 5)
  • 3.6 Cars sold by condition ranges (size 10)
  • 3.7 Box plots of price distribution by color

πŸ“ˆ Results

Business Recommendations

  1. Inventory Strategy

    • Focus on 2010-2015 model years
    • Target condition scores 35+
    • Prioritize vehicles under 70K miles
    • Stock neutral colors (white, black, gray)
  2. Pricing Optimization

    • Base pricing on condition, mileage, year, and color
    • Expected ROI improvement: 15-25%
    • Dynamic pricing model provided
  3. Market Positioning

    • Premium segment: 20% (low mileage, excellent condition)
    • Standard segment: 50% (average condition, popular models)
    • Value segment: 30% (higher mileage, quick turnover)

Statistical Summary

  • Central Tendency: Mean $13,611, Median $12,100
  • Dispersion: High variance (~60% coefficient of variation)
  • Distribution: Right-skewed with heavy tails
  • Outliers: 2.93% identified and handled

🀝 Contributing

Contributions, issues, and feature requests are welcome!

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ‘€ Contact

Sukesh Singla
HR Analytics Specialist | Data Analyst


πŸ™ Acknowledgments

  • Hero Vired - For the comprehensive Python programming course
  • Dataset Source - Used car listings data
  • Python Community - For excellent libraries and tools

πŸ“š Additional Resources


⭐ Star this repository if you found it helpful!

Made with ❀️ by Sukesh Singla

About

Comprehensive data analysis of 558,837 used car listings using Python and Pandas. Includes EDA, visualizations, and actionable business insights. Hero Vired Assignment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors