Skip to content

Latest commit

 

History

History
31 lines (24 loc) · 2.6 KB

File metadata and controls

31 lines (24 loc) · 2.6 KB

Aadhaar Service Resilience Audit

Team ID: UIDAI_1873
Project Focus: Geospatial Stress Modeling and Infrastructure Continuity

Project Overview

This repository contains a full-stack data audit of the UIDAI ecosystem. The project identifies high-intensity service clusters and geographic blackout zones where infrastructure fails to meet seasonal demand spikes. By shifting from raw transaction counts to a custom Service Stress Index, this analysis provides actionable intelligence for targeted resource deployment.

Data Engineering Feats

The raw UIDAI datasets contained significant noise and fragmentation. My pre-processing pipeline achieved the following:

  • Massive Consolidation: Merged 12 fragmented CSVs into 3 high-fidelity master datasets covering Biometric, Enrollment, and Demographic data.
  • Large-Scale Data Salvage: Successfully repaired over 2.8 million inconsistent date entries using custom normalization logic.
  • Geospatial Correction: Rescued thousands of geographic misclassifications (such as city names in state columns) using regex-based auditing to ensure a perfect 36-state/UT representation.
  • High Retention: Maintained a 94.8% data retention rate across 4.3 million records, ensuring insights were built on a complete national foundation.

Key Insights

  1. Service Stress Index: Identified 157 Red Zone Pincodes where daily demand exceeds 150 requests, uncovering a Family Trigger effect where minor updates lead to a 2:1 ratio of adult biometric refreshes.
  2. Infrastructure Blackout Audit: Pinpointed 1,800 high-demand Pincodes suffering from chronic service cessations (2 or more months), revealing a systemic failure peak between March and July during the academic rush.

Repository Structure

  • /analysis: Scripts for the Stress Index, Blackout Detection, and Correlation Heatmaps.
  • /data cleaning: Automation scripts for regex-based repair, normalization, and master file consolidation.
  • /derived stuff: Cleaned master CSVs, output analysis logs, and high-resolution visualizations.

Declaration of LLM Usage

In alignment with competition guidelines, Emerging Large Language Models (LLMs) were utilized as a thought-partner in this project. The AI assisted in:

  • Code Optimization: Refining regex patterns for high-speed data cleaning.
  • Structural Logic: Brainstorming the mathematical framework for the Service Stress Index.
  • Documentation: Assisting in the clear communication of technical findings.
  • Note: All data interpretations, statistical validations, and final analytical conclusions were verified and finalized by the human lead.