Skip to content

Commit 98037e1

Browse files
committed
Build out RootStack with schemas, seed data, and queries
Three data domains implemented: - Development indicators: 20 indicators (NFHS-5, SRS, PLFS, CPCB) with 51 state/district-level data points across health, education, gender, climate - Government schemes: 16 major central schemes with budget data (2023-24) - Tools catalog: 32 tools registered across InsightStack, FieldStack, EquityStack Includes setup script, example queries, and data dictionary.
1 parent 8d6bc42 commit 98037e1

18 files changed

Lines changed: 949 additions & 32 deletions

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
*.db
2+
*.sqlite
3+
*.sqlite3
4+
.DS_Store

README.md

Lines changed: 111 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,146 @@
11
# RootStack
22

3-
**Foundational data schemas for the OpenStacks ecosystem.**
3+
**Foundational data schemas, seed data, and queries for the OpenStacks ecosystem.**
44

55
[![Part of OpenStacks](https://img.shields.io/badge/Part%20of-OpenStacks-blue)](https://openstacks.dev)
66
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
7-
[![Status: Early Stage](https://img.shields.io/badge/Status-Early%20Stage-orange)]()
87

9-
> The database layer for OpenStacks — schemas, seed data, and queries.
8+
> The database layer for OpenStacks — structured development data for India and South Asia.
109
1110
---
1211

13-
## Status
12+
## What This Is
1413

15-
**This repository is in early development.** The architecture and goals are documented below, but the actual SQL schemas, seed data, and queries have not yet been implemented. Contributions are welcome to help build this out.
14+
RootStack provides a **SQLite database** with schemas, seed data, and ready-to-use queries covering three domains:
1615

17-
## Vision
16+
1. **Development Indicators** — Health, education, gender, and climate data at state and district level (NFHS-5, SRS, UDISE+, CPCB)
17+
2. **Government Schemes** — Central and state schemes with budget allocation, spending, and coverage data
18+
3. **Tools Catalog** — Registry of all scripts, notebooks, and templates across the OpenStacks ecosystem
1819

19-
RootStack will provide the foundational data layer for the OpenStacks ecosystem:
20+
## Quick Start
2021

21-
- **Structured schemas** (PostgreSQL/SQLite) for development sector data
22-
- **Seed datasets** for testing and demonstration
23-
- **Example queries** for common analysis patterns
24-
- **Migration scripts** for schema evolution
22+
```bash
23+
git clone https://github.com/Varnasr/RootStack.git
24+
cd RootStack
25+
bash scripts/setup.sh
26+
```
2527

26-
### Planned Architecture
28+
This creates `rootstack.db` with all tables seeded. Then run queries:
2729

30+
```bash
31+
# Health indicators across states
32+
sqlite3 -header -column rootstack.db < queries/01_health_dashboard.sql
33+
34+
# Government scheme budgets and spending
35+
sqlite3 -header -column rootstack.db < queries/02_policy_analysis.sql
36+
37+
# Cross-cutting analysis (climate-health, tribal districts, tools search)
38+
sqlite3 -header -column rootstack.db < queries/03_cross_cutting.sql
2839
```
29-
RootStack (Database) → BridgeStack (API) → ViewStack (Frontend)
30-
```
3140

32-
RootStack feeds data to [BridgeStack](https://github.com/Varnasr/BridgeStack) via SQL, which exposes it through a REST API to [ViewStack](https://github.com/Varnasr/ViewStack).
41+
### Prerequisites
42+
43+
- **SQLite 3** (pre-installed on most systems; `brew install sqlite3` on Mac, `apt install sqlite3` on Linux)
3344

34-
### Planned Structure
45+
## What's Inside
3546

3647
```
3748
RootStack/
38-
├── schemas/
39-
│ ├── tables.sql # Core table definitions
40-
│ └── migrations/ # Schema version changes
41-
├── seed_data/
42-
│ └── seed_initial.sql # Test/demo data
43-
├── queries/
44-
│ └── example_queries.sql # Common query patterns
49+
├── schemas/ # Table definitions (run in order)
50+
│ ├── 001_geography.sql # States and districts of India
51+
│ ├── 002_sectors.sql # Development sector taxonomy
52+
│ ├── 003_indicators.sql # Indicator definitions and values
53+
│ ├── 004_policies.sql # Government schemes and budgets
54+
│ └── 005_tools_catalog.sql # OpenStacks tools registry
55+
56+
├── seed_data/ # Initial data (run in order)
57+
│ ├── 001_sectors.sql # 19 sectors and sub-sectors
58+
│ ├── 002_states.sql # 30 states and UTs
59+
│ ├── 003_districts_sample.sql# 26 sample districts across tiers
60+
│ ├── 004_indicators.sql # 20 indicators with 51 data points
61+
│ ├── 005_schemes.sql # 16 major schemes with budgets
62+
│ └── 006_tools_catalog.sql # 32 tools across 4 stacks
63+
64+
├── queries/ # Ready-to-use analysis queries
65+
│ ├── 01_health_dashboard.sql # State scorecards, district rankings, gender gaps
66+
│ ├── 02_policy_analysis.sql # Budget overview, utilization, sector spending
67+
│ └── 03_cross_cutting.sql # Multi-sector profiles, climate-health nexus
68+
69+
├── scripts/
70+
│ └── setup.sh # One-command database creation
71+
4572
└── docs/
46-
└── data_dictionary.md # Field descriptions and relationships
73+
└── data_dictionary.md # Table descriptions, field conventions, sources
4774
```
4875

49-
## How to Contribute
76+
## Database Schema
77+
78+
```
79+
states ──< districts
80+
sectors ──< indicators ──< indicator_values >── districts
81+
sectors ──< schemes ──< scheme_budgets
82+
──< scheme_coverage >── states/districts
83+
stacks ──< tools >── sectors
84+
```
85+
86+
### Current Data
87+
88+
| Table | Records | Source |
89+
|-------|---------|--------|
90+
| States | 30 | Census 2011 |
91+
| Districts | 26 (sample) | Census 2011 |
92+
| Sectors | 19 | OpenStacks taxonomy |
93+
| Indicators | 20 definitions | NFHS-5, SRS, PLFS, CPCB |
94+
| Indicator Values | 51 data points | Various (see data dictionary) |
95+
| Schemes | 16 major schemes | Union Budget, Ministry reports |
96+
| Budgets | 13 entries | Union Budget 2023-24 |
97+
| Stacks | 4 registered | OpenStacks ecosystem |
98+
| Tools | 32 cataloged | InsightStack, FieldStack, EquityStack |
5099

51-
This is a great repo to contribute to if you have experience with:
52-
- PostgreSQL or SQLite schema design
53-
- Development sector data structures (surveys, indicators, program data)
54-
- Database migration workflows
100+
## Example Output
55101

56-
See the [OpenStacks hub](https://github.com/Varnasr/OpenStacks-for-Change) for ecosystem-wide contribution guidelines.
102+
**State health scorecard:**
103+
```
104+
State IMR Stunting Anaemia
105+
Uttar Pradesh 40.0 39.7%
106+
Bihar 38.0 42.9% 63.5%
107+
Madhya Pradesh 36.0 35.7%
108+
Kerala 6.0 23.4% 36.3%
109+
```
110+
111+
**Scheme budget utilization:**
112+
```
113+
Scheme Allocated (Cr) Spent (Cr) Utilization
114+
Jal Jeevan Mission 70,000 62,000 88.6%
115+
MGNREGA 60,000 82,000 136.7%
116+
National Health Mission 36,785 33,200 90.3%
117+
NCAP (Clean Air) 460 350 76.1%
118+
```
57119

58120
## How It Connects
59121

122+
RootStack is the data foundation of the [OpenStacks](https://openstacks.dev) ecosystem:
123+
60124
| Stack | Role | Link |
61125
|-------|------|------|
62126
| **RootStack** (this repo) | Database schemas & seed data | You are here |
63-
| [BridgeStack](https://github.com/Varnasr/BridgeStack) | API backend (FastAPI) | Consumes RootStack data |
64-
| [ViewStack](https://github.com/Varnasr/ViewStack) | Frontend UI | Displays BridgeStack API data |
127+
| [BridgeStack](https://github.com/Varnasr/BridgeStack) | API backend (FastAPI) | Will consume RootStack data |
128+
| [ViewStack](https://github.com/Varnasr/ViewStack) | Frontend UI | Will display via BridgeStack |
129+
| [InsightStack](https://github.com/Varnasr/InsightStack) | MEL tools (Stata/Python/R) | Tools cataloged here |
130+
| [FieldStack](https://github.com/Varnasr/FieldStack) | R notebooks for fieldwork | Tools cataloged here |
131+
| [EquityStack](https://github.com/Varnasr/EquityStack) | Python data workflows | Tools cataloged here |
132+
133+
## Contributing
134+
135+
Contributions welcome — especially:
136+
137+
- **More district data** — Expand from 26 to all 766 districts
138+
- **More indicators** — Add UDISE+ education data, PLFS employment data
139+
- **More schemes** — State-level schemes, historical budget data
140+
- **PostgreSQL port** — Adapt schemas for PostgreSQL deployment
141+
- **Migration scripts** — Schema versioning for production use
142+
143+
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
65144

66145
## License
67146

docs/data_dictionary.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# RootStack Data Dictionary
2+
3+
## Tables Overview
4+
5+
| Table | Purpose | Key Fields |
6+
|-------|---------|------------|
7+
| `states` | Indian states and UTs | state_id, state_name, region, census_2011_pop |
8+
| `districts` | Districts within states | district_id, district_name, state_id, tier |
9+
| `sectors` | Development sector taxonomy | sector_id, sector_name, parent_id |
10+
| `indicators` | Indicator definitions | indicator_id, indicator_name, unit, direction |
11+
| `indicator_values` | Actual indicator data points | indicator_id, state_id/district_id, year, value |
12+
| `schemes` | Government schemes and policies | scheme_id, scheme_name, ministry, sector_id |
13+
| `scheme_budgets` | Budget allocation and spending | scheme_id, financial_year, allocated, spent |
14+
| `scheme_coverage` | Scheme beneficiary coverage | scheme_id, state_id, beneficiaries, target |
15+
| `stacks` | OpenStacks repository registry | stack_id, stack_name, repo_url |
16+
| `tools` | Tools catalog across stacks | tool_id, stack_id, directory, language, tool_type |
17+
18+
## Key Relationships
19+
20+
```
21+
states ──< districts
22+
sectors ──< indicators ──< indicator_values >── districts
23+
sectors ──< schemes ──< scheme_budgets
24+
──< scheme_coverage >── states/districts
25+
stacks ──< tools >── sectors
26+
```
27+
28+
## Data Sources
29+
30+
| Source | Coverage | Indicators |
31+
|--------|----------|------------|
32+
| NFHS-5 (2019-21) | State + District | Stunting, anaemia, immunization, ANC, delivery |
33+
| SRS 2020 | State | IMR, MMR, sex ratio |
34+
| PLFS 2022-23 | State | Female LFPR |
35+
| UDISE+ | State + District | GER, dropout, PTR |
36+
| Census 2011 | State + District | Population, literacy, area |
37+
| IQAir/CPCB 2023 | State + City | PM2.5 annual mean |
38+
| NDMA Risk Atlas | District | Flood risk, drought risk |
39+
| Union Budget | National | Scheme budget allocation/spending |
40+
41+
## Field Conventions
42+
43+
- **IDs**: lowercase with hyphens (e.g., `JH-khunti`, `health_mch`)
44+
- **Dates**: ISO 8601 strings (SQLite convention)
45+
- **Money**: Crores INR for budgets
46+
- **Rates**: Per 1000 or per 100,000 as indicated by unit
47+
- **Percentages**: Stored as 0-100 (not 0-1)
48+
- **Indices**: Stored as 0-1 range

queries/01_health_dashboard.sql

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
-- ============================================================
2+
-- Health Dashboard Queries
3+
-- Common analysis patterns for health indicators
4+
-- ============================================================
5+
6+
-- 1. State-level health scorecard
7+
-- Compare key health indicators across states
8+
SELECT
9+
s.state_name,
10+
MAX(CASE WHEN iv.indicator_id = 'imr' THEN iv.value END) AS infant_mortality_rate,
11+
MAX(CASE WHEN iv.indicator_id = 'stunting' THEN iv.value END) AS stunting_pct,
12+
MAX(CASE WHEN iv.indicator_id = 'anaemia_women' THEN iv.value END) AS anaemia_women_pct,
13+
MAX(CASE WHEN iv.indicator_id = 'immunization' THEN iv.value END) AS immunization_pct
14+
FROM indicator_values iv
15+
JOIN states s ON iv.state_id = s.state_id
16+
WHERE iv.indicator_id IN ('imr', 'stunting', 'anaemia_women', 'immunization')
17+
GROUP BY s.state_name
18+
ORDER BY infant_mortality_rate DESC;
19+
20+
21+
-- 2. Worst-performing districts for stunting
22+
-- Identify districts needing nutrition interventions
23+
SELECT
24+
d.district_name,
25+
s.state_name,
26+
d.tier,
27+
iv.value AS stunting_pct,
28+
iv.source_detail
29+
FROM indicator_values iv
30+
JOIN districts d ON iv.district_id = d.district_id
31+
JOIN states s ON d.state_id = s.state_id
32+
WHERE iv.indicator_id = 'stunting'
33+
ORDER BY iv.value DESC
34+
LIMIT 10;
35+
36+
37+
-- 3. IMR trend comparison: high-burden vs low-burden states
38+
SELECT
39+
s.state_name,
40+
s.region,
41+
iv.year,
42+
iv.value AS imr,
43+
CASE
44+
WHEN iv.value > 30 THEN 'High burden'
45+
WHEN iv.value > 15 THEN 'Medium burden'
46+
ELSE 'Low burden'
47+
END AS burden_category
48+
FROM indicator_values iv
49+
JOIN states s ON iv.state_id = s.state_id
50+
WHERE iv.indicator_id = 'imr'
51+
ORDER BY iv.value DESC;
52+
53+
54+
-- 4. Gender gap in health outcomes
55+
-- States where male-female differences are largest
56+
SELECT
57+
s.state_name,
58+
iv.indicator_id,
59+
i.indicator_name,
60+
iv.value AS overall,
61+
iv.value_male,
62+
iv.value_female,
63+
ABS(COALESCE(iv.value_male, 0) - COALESCE(iv.value_female, 0)) AS gender_gap
64+
FROM indicator_values iv
65+
JOIN states s ON iv.state_id = s.state_id
66+
JOIN indicators i ON iv.indicator_id = i.indicator_id
67+
WHERE iv.value_male IS NOT NULL AND iv.value_female IS NOT NULL
68+
ORDER BY gender_gap DESC;

queries/02_policy_analysis.sql

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
-- ============================================================
2+
-- Policy & Scheme Analysis Queries
3+
-- Budget tracking, coverage, and effectiveness
4+
-- ============================================================
5+
6+
-- 1. Scheme budget overview
7+
-- All schemes with their latest budget allocation and spending
8+
SELECT
9+
sc.scheme_name,
10+
sc.ministry,
11+
sec.sector_name,
12+
sb.financial_year,
13+
sb.budget_allocated AS allocated_cr,
14+
sb.budget_spent AS spent_cr,
15+
ROUND(sb.budget_spent * 100.0 / NULLIF(sb.budget_allocated, 0), 1) AS utilization_pct
16+
FROM schemes sc
17+
JOIN scheme_budgets sb ON sc.scheme_id = sb.scheme_id
18+
LEFT JOIN sectors sec ON sc.sector_id = sec.sector_id
19+
WHERE sb.financial_year = '2023-24'
20+
ORDER BY sb.budget_allocated DESC;
21+
22+
23+
-- 2. Top schemes by budget size
24+
SELECT
25+
sc.scheme_name,
26+
sc.level,
27+
SUM(sb.budget_allocated) AS total_allocated_cr,
28+
SUM(sb.budget_spent) AS total_spent_cr,
29+
ROUND(SUM(sb.budget_spent) * 100.0 / NULLIF(SUM(sb.budget_allocated), 0), 1) AS avg_utilization
30+
FROM schemes sc
31+
JOIN scheme_budgets sb ON sc.scheme_id = sb.scheme_id
32+
GROUP BY sc.scheme_id
33+
ORDER BY total_allocated_cr DESC;
34+
35+
36+
-- 3. Sector-wise government spending
37+
SELECT
38+
sec.sector_name,
39+
COUNT(DISTINCT sc.scheme_id) AS num_schemes,
40+
SUM(sb.budget_allocated) AS total_allocated_cr,
41+
SUM(sb.budget_spent) AS total_spent_cr
42+
FROM sectors sec
43+
JOIN schemes sc ON sec.sector_id = sc.sector_id
44+
JOIN scheme_budgets sb ON sc.scheme_id = sb.scheme_id
45+
WHERE sb.financial_year = '2023-24'
46+
GROUP BY sec.sector_name
47+
ORDER BY total_allocated_cr DESC;
48+
49+
50+
-- 4. Under-utilised schemes (budget allocated but poorly spent)
51+
SELECT
52+
sc.scheme_name,
53+
sb.financial_year,
54+
sb.budget_allocated AS allocated_cr,
55+
sb.budget_spent AS spent_cr,
56+
ROUND(sb.budget_spent * 100.0 / NULLIF(sb.budget_allocated, 0), 1) AS utilization_pct
57+
FROM schemes sc
58+
JOIN scheme_budgets sb ON sc.scheme_id = sb.scheme_id
59+
WHERE sb.budget_spent IS NOT NULL
60+
AND sb.budget_allocated > 0
61+
AND (sb.budget_spent * 100.0 / sb.budget_allocated) < 80
62+
ORDER BY utilization_pct ASC;
63+
64+
65+
-- 5. Schemes targeting women and gender equity
66+
SELECT
67+
sc.scheme_name,
68+
sc.description,
69+
sc.beneficiary_type,
70+
sb.budget_allocated AS latest_budget_cr
71+
FROM schemes sc
72+
LEFT JOIN scheme_budgets sb ON sc.scheme_id = sb.scheme_id
73+
AND sb.financial_year = '2023-24'
74+
WHERE sc.sector_id LIKE 'gender%'
75+
OR sc.beneficiary_type LIKE '%Women%'
76+
OR sc.beneficiary_type LIKE '%Girl%'
77+
ORDER BY sb.budget_allocated DESC;

0 commit comments

Comments
 (0)