Comprehensive data analysis of 558,837 used car listings using Python and Pandas
Author: Sukesh Singla
Course: Hero Vired - Python Programming
Date: November 2025
Status: β
Complete
- Overview
- Dataset
- Key Findings
- Project Structure
- Installation
- Usage
- Visualizations
- Technologies
- Assignment Tasks
- Results
- Contributing
- License
- Contact
This project performs comprehensive exploratory data analysis (EDA) on used car listings data to extract actionable business insights, identify pricing patterns, and understand market dynamics in the automobile sector.
- Develop hands-on proficiency in data analysis using Pandas
- Perform essential data wrangling: cleaning, filtering, grouping, and summarizing
- Extract meaningful business insights through statistical analysis
- Create professional visualizations with Matplotlib and Seaborn
- Apply foundational techniques for exploratory data analysis
Source: Used Car Listings Dataset
Records: 558,837 car listings
Features: 16 attributes
Time Period: 1982-2015
year- Model yearmake- Car manufacturer/brandmodel- Car modelcondition- Condition score (0-50)odometer- Mileage readingsellingprice- Actual selling pricecolor- Exterior colorstate- State where sold- And more...
- Average Price: $13,611.33
- Price Range: $1 - $230,000
- Most Popular Model: Nissan Altima (29,748 listings - 5.3% market share)
- Premium Brands: Rolls-Royce ($153K avg), Ferrari ($127K avg), Lamborghini ($113K avg)
- Condition Score - Strongest predictor (+$100 per point)
- Odometer Reading - Clear negative correlation (-$100 per 1K miles over 50K)
- Model Year - Newer vehicles command 20-30% premium
- Color - Neutral colors (white/black/gray) earn 10-15% more
- Completeness: 98.4% after cleaning
- Duplicates: 0 found
- Missing Values: Handled appropriately (11.69% in transmission column)
car-price-analysis/
β
βββ README.md # Project documentation
βββ requirements.txt # Python dependencies
βββ .gitignore # Git ignore rules
βββ LICENSE # MIT License
β
βββ car_price_analysis.py # Main analysis script
β
βββ notebooks/ # Jupyter notebooks
β βββ Car_Price_Analysis_COMPLETE.ipynb
β βββ SIMPLE_JUPYTER_CELLS.py
β
βββ visualizations/ # Generated charts (10 PNG files)
β βββ 1_missing_values_bar.png
β βββ 2_missing_values_heatmap.png
β βββ 3_correlation_matrix.png
β βββ 4_price_by_year.png
β βββ 5_price_by_odometer.png
β βββ 6_cars_by_state.png
β βββ 7_price_by_condition_ranges.png
β βββ 8_cars_by_condition_ranges.png
β βββ 9_price_by_color_with_outliers.png
β βββ 10_price_by_color_without_outliers.png
β
βββ docs/ # Documentation
βββ COMPREHENSIVE_ANALYSIS_REPORT.md
βββ ASSIGNMENT_COMPLETION_SUMMARY.md
βββ JUPYTER_NOTEBOOK_GUIDE.md
- Python 3.8 or higher
- pip package manager
- Clone the repository
git clone https://github.com/YOUR_USERNAME/car-price-analysis.git
cd car-price-analysis- Install dependencies
pip install -r requirements.txt- Download the dataset
- Place your
car_prices.csvfile in the project root - Or use the cleaned version:
car_prices_cleaned.csv
python car_price_analysis.pyThis will:
- Load and clean the dataset
- Perform all analyses
- Generate 10 visualizations
- Save results to
outputs/folder
jupyter notebook notebooks/Car_Price_Analysis_COMPLETE.ipynbThis provides:
- Interactive cell-by-cell execution
- Inline visualization display
- Rich markdown documentation
- Easy experimentation
For beginners, use SIMPLE_JUPYTER_CELLS.py:
- Copy each cell section into a new Jupyter cell
- Run cells individually with Shift+Enter
- See immediate results and visualizations
- pandas - Data manipulation and analysis
- numpy - Numerical computing
- matplotlib - Data visualization
- seaborn - Statistical graphics
- scipy - Scientific computing
- Jupyter Notebook - Interactive development
- Python 3.12 - Programming language
- β Data cleaning and preprocessing
- β Exploratory Data Analysis (EDA)
- β Statistical analysis
- β Data visualization
- β Business intelligence
- β Python programming
- β Documentation
- 1.1 Load & Inspect - Display first 5 rows, data types, record count
- 1.2 Understanding Data Structure - Shape, columns, data types
- 1.3 Missing & Anomaly Detection - Quantify nulls, visualize, resolve
- 2.1 Calculate average, minimum, and maximum car price
- 2.2 List all unique colors
- 2.3 Find number of unique brands and models
- 2.4 Find cars with selling price > $165,000
- 2.5 Find top 5 most frequently sold models
- 2.6 Average selling price by brand
- 2.7 Minimum selling price by interior
- 2.8 Highest odometer reading per year
- 2.9 Create car age column
- 2.10 Filter cars by condition and odometer criteria
- 2.11 Analyze state pricing for newer cars
- 2.12 Value for money analysis
- 3.1 Correlation matrix of numerical features
- 3.2 Average selling price by year
- 3.3 Average selling price by odometer
- 3.4 Number of cars sold by state
- 3.5 Price by condition score ranges (size 5)
- 3.6 Cars sold by condition ranges (size 10)
- 3.7 Box plots of price distribution by color
-
Inventory Strategy
- Focus on 2010-2015 model years
- Target condition scores 35+
- Prioritize vehicles under 70K miles
- Stock neutral colors (white, black, gray)
-
Pricing Optimization
- Base pricing on condition, mileage, year, and color
- Expected ROI improvement: 15-25%
- Dynamic pricing model provided
-
Market Positioning
- Premium segment: 20% (low mileage, excellent condition)
- Standard segment: 50% (average condition, popular models)
- Value segment: 30% (higher mileage, quick turnover)
- Central Tendency: Mean $13,611, Median $12,100
- Dispersion: High variance (~60% coefficient of variation)
- Distribution: Right-skewed with heavy tails
- Outliers: 2.93% identified and handled
Contributions, issues, and feature requests are welcome!
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Sukesh Singla
HR Analytics Specialist | Data Analyst
- π§ Email: ssingla25@gmail.com
- πΌ LinkedIn: https://linkedin.com/in/sukesh-singla-667701a5
- π± GitHub: @Sukesh1985
- π Location: Delhi, India
- Hero Vired - For the comprehensive Python programming course
- Dataset Source - Used car listings data
- Python Community - For excellent libraries and tools
β Star this repository if you found it helpful!
Made with β€οΈ by Sukesh Singla









