Current development priorities and planned work.
- Modular folder structure with core modules
- Data cleaning utilities (column names, dtypes, missing values)
- Input validation with Pydantic
- I/O helpers (chunked CSV, Stata/SPSS import, Excel export)
- Multicollinearity checks (VIF)
- Visualisation (annotated bar charts, district-level maps)
- Public health access index
- 2 working Jupyter notebooks (gender summary, WEE time-use)
- Test suite (10 pytest files)
- Sample datasets
- EDA module: correlation matrices, group summaries, data profiling
- Modelling: logistic regression evaluation, OLS summary tables
- Social sector: climate risk flags, education outcomes, gender disaggregation
- More Jupyter notebooks with worked examples
- NLP utilities for text-based survey analysis
- Public API integration (NFHS, World Bank)
- Visualisation: categorical distributions, regression diagnostics, time series
- GitHub Pages documentation site
- Command-line interface for common tasks
- Expanded sample datasets (education, climate, WEE)
- Automated data quality reports
For suggestions, open an issue or start a discussion.