Skip to content

Tosa9/CodeAlpha_UnemploymentAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Unemployment Analysis with Python

CodeAlpha Data Science Internship — Task 2

Intern: Omokhoa Oshose Tosayoname
Intern ID: CA/DF1/71570
Duration: 20th May 2026 – 20th June 2026


Overview

This project analyses unemployment trends across Indian states from 2019 to 2020, with a focused investigation into the impact of the Covid-19 pandemic and the nationwide lockdown (25 March 2020) on employment. The analysis covers rural vs urban unemployment dynamics, state-level vulnerability, geographic zone comparisons, and the relationship between unemployment and labour participation rates.

Business/Policy Question: How did the Covid-19 lockdown affect unemployment across India's states and regions, and which areas were most vulnerable?


Project Pipeline

Data Loading & Cleaning --> EDA --> Time Series Analysis
    --> Covid-19 Impact Analysis --> Regional Analysis --> Policy Insights

Project Structure

CodeAlpha_UnemploymentAnalysis/
├── data/
│   ├── Unemployment_in_India.csv              # Dataset 1: 2019-2020, Rural/Urban
│   ├── Unemployment_Rate_upto_11_2020.csv     # Dataset 2: 2020 with geo-coordinates
│   └── *.png                                  # All generated visualisations
├── notebooks/
│   └── unemployment_analysis.ipynb            # Main notebook (fully executed)
├── requirements.txt
└── README.md

Datasets

Dataset Records Period Key Feature
Unemployment_in_India.csv 740 2019–2020 Rural/Urban area breakdown, 28 states
Unemployment_Rate_upto_11_2020.csv 267 Jan–Nov 2020 Geographic coordinates, zone classification

Features: Region, Date, Unemployment Rate (%), Estimated Employed, Labour Participation Rate (%), Area (Rural/Urban), Geographic Zone, Coordinates.


Key Results

Phase Avg Unemployment Rate
Pre-Covid 9.23%
Lockdown (Mar–Jun 2020) 16.74%
Recovery (Jul–Nov 2020) 9.22%
National Peak 23.24% (May 2020)

Unemployment surged 7.5 percentage points from Pre-Covid to Lockdown.
Most affected state: Puducherry (+37.4 pp increase)


Exploratory Data Analysis

Unemployment Rate Distribution by Phase

Rate Distribution

The lockdown phase shows a clearly right-shifted distribution with extreme outlier values representing the worst-affected states in April and May 2020.


Rural vs Urban Unemployment Comparison

Rural vs Urban

Urban unemployment was higher and more volatile than rural unemployment throughout the observation period, reflecting greater exposure to formal sector disruptions during lockdown.


Time Series Analysis

National Monthly Unemployment Rate — 2020

National Time Series

The unemployment rate spiked sharply after the lockdown began on 25 March 2020, peaking at 23.24% in May 2020. Recovery was rapid after Unlock Phase 1 in June 2020.


Full Timeline: 2019–2020

Full Time Series

Pre-pandemic unemployment was relatively stable around 9%. The Covid-19 shock stands out as a clear structural break in the time series.


Rural vs Urban Unemployment Over Time

Rural Urban Timeseries

Urban areas showed a sharper and more prolonged spike during the lockdown. Rural unemployment recovered faster, likely driven by continued agricultural activity.


Covid-19 Impact Analysis

Unemployment by Covid Phase

Covid Impact Phases

The lockdown phase shows both a significantly higher mean and a much wider spread in unemployment rates, reflecting uneven impact across states.


State-Level Covid Impact: Pre-Covid to Lockdown

State Covid Impact

Most states saw large increases in unemployment. A small number of predominantly rural or agricultural states showed resilience or even slight decreases.


Top 10 Most and Least Affected States

Top Affected States

Puducherry, Jharkhand, and Bihar were among the hardest hit. States with strong agricultural or informal economies showed more resilience.


Regional Analysis

Average Unemployment by State (2020)

State Average Unemployment

Significant variation exists across states. States like Haryana and Tripura showed the highest average unemployment rates throughout 2020.


Unemployment by Geographic Zone Over Time

Zone Timeseries

All zones spiked sharply during the lockdown, but Central and North zones showed the highest peak values and more erratic recovery patterns.


Unemployment Rate Heatmap: State vs Month

State Month Heatmap

The heatmap reveals the April–May 2020 lockdown period (deep red columns) as a clear shock across almost all states, with visible recovery by July 2020.


Unemployment vs Labour Participation Rate Over Time

Unemployment vs LPR

As unemployment surged, the Labour Participation Rate dropped simultaneously, indicating many workers stopped seeking employment entirely during the lockdown — a discouraged worker effect.


Unemployment vs Labour Participation Rate Scatter by Phase

Unemployment vs LPR Scatter

The lockdown phase occupies a distinct cluster with high unemployment and low participation, confirming the structural shock to the labour market.


Feature Correlation Matrix

Correlation Heatmap


Key Policy Insights

  • India's unemployment rate surged from ~9% to a peak of 23.24% in May 2020, a 7.5 percentage point increase driven almost entirely by the lockdown.
  • Urban workers were disproportionately affected compared to rural workers, reflecting greater exposure to the formal and services sectors.
  • The discouraged worker effect was significant: as unemployment rose, labour participation fell sharply, understating the true scale of employment disruption.
  • Recovery was rapid once Unlock Phase 1 began in June 2020, with rates returning to near pre-pandemic levels by August 2020.
  • Puducherry saw the most extreme lockdown impact (+37.4 pp), while agricultural states showed relative resilience.
  • Policy interventions targeting urban informal workers and migrant labour would have had the greatest impact during the lockdown period.

How to Run

  1. Clone the repository:

    git clone https://github.com/Tosa9/CodeAlpha_UnemploymentAnalysis.git
    cd CodeAlpha_UnemploymentAnalysis
  2. Install dependencies:

    pip install -r requirements.txt
  3. Launch the notebook:

    jupyter notebook notebooks/unemployment_analysis.ipynb

Dataset Source

Unemployment in India — Kaggle


CodeAlpha Data Science Internship | Task 2
#CodeAlpha #DataScience #UnemploymentAnalysis #Covid19 #Python #EDA

About

Unemployment trend analysis across 28 Indian states (2019–2020) with focus on Covid-19 lockdown impact. National rate peaked at 23.24% in May 2020. Covers rural vs urban dynamics, state-level vulnerability, zone comparisons, and labour participation analysis. CodeAlpha Data Science Internship — Task 2.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors