Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 117 additions & 27 deletions docs/projects/stl_data_api/about.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,136 @@
---
id: about
title: STL Data API
title: STL Metro Data API
custom_edit_url: null
---

# STL Metro Data API

## Overview

The STL Data API project aims to create a centralized, user-friendly platform that serves as a proxy for accessing and interacting with public data from various regional and municipal sources, particularly focused on the St. Louis region. The project addresses the challenges of inconsistent data formats, lack of standardization, and repetitive efforts in compiling datasets and reports by providing a RESTful API and a web portal for researchers, journalists, policy-makers, and data enthusiasts.
The **STL Metro Data API** is an open-source platform designed to centralize, standardize, and expose public data from across the St. Louis metropolitan region. Today, regional datasets are scattered across dozens of portals, published in inconsistent formats, and often difficult for residents, researchers, and developers to work with.

This project solves that problem.

We are building a **high-quality, scalable, event-driven data pipeline** and a **RESTful API layer** that makes St. Louis public data accessible, reliable, and easy to integrate into applications, reports, and research.

Imagine a single platform that brings together all regional datasets—transit information, zoning records, public health data, crime statistics, environmental datasets, and more—into one consistent, developer-friendly API.

Our mission is simple:

### **Join us in unlocking the power of open data!**

## What This Project Offers

- A **unified API** to access St. Louis public datasets
- Automatic **data ingestion, normalization, and event sourcing**
- A future **web portal** for browsing, visualizing, and exporting datasets
- Tools for researchers, journalists, policymakers, students, and civic developers
- A repeatable, open-source architecture for other regions to adopt

## New Technical Direction (2025)

The STL Metro Data API has evolved significantly. The project now follows a **CQRS + Event Sourcing microservices architecture**, optimized for scalability, openness, and durability.

### **Core Components**
- **write_service** – Handles ingestion, validation, transformation, and event creation
- **read_service** – Serves aggregated/query-optimized data to the public API
- **Kafka** – Backbone for event streaming and service decoupling
- **PostgreSQL** – Storage for events and read-models
- **Dockerized Microservices** – Self-contained, reproducible development environments
- **RESTful API Gateway** – Provides secure and predictable endpoints
- **Future Web Portal** – A user-friendly interface for dataset browsing & chart creation

## Project Information

- **Source Code:** [https://github.com/oss-slu/stl_metro_data_api](https://github.com/oss-slu/stl_metro_data_api) [<img src="/img/git-alt.svg" alt="git" width="25" height="25" />](https://github.com/oss-slu/stl_metro_data_api)

- **Client:** *Dr. Sandoval*, Sociology and Anthropology, Saint Louis University

- **IT Architect:** *Patrick Cuba*, IT Architect, Saint Louis University [<img src="/img/github.svg" alt="github" width="25" height="25" />](https://github.com/cubap)

### Information
- **Current Tech Lead:** Prem Kiran Polepalli [<img src="/img/github.svg" alt="github" width="25" height="25" />](https://github.com/premkiran2)

- **Source Code:** [https://github.com/oss-slu/stl_metro_dat_api](https://github.com/oss-slu/stl_metro_dat_api) [<img src="/img/git-alt.svg" alt="git" width="25" height="25" />](https://github.com/oss-slu/stl_metro_dat_api)
- **Client:** Dr. Sandoval, Sociology and Anthropology
- **Current Tech Lead:** Prem Kiran Polepalli [<img src="/img/github.svg" alt="github" width="25" height="25" />](https://github.com/premkiran2)
- **Developers:**
- **Technologies Using:**
- TBD
- **Developers:**
- Briana Huelsman (capstone) [<img src="/img/github.svg" alt="github" width="25" height="25" />](https://github.com/bhuelsman)
- Elizabeth Dreste (capstone) [<img src="/img/github.svg" alt="github" width="25" height="25" />](https://github.com/LilLizDog)
- Erin Kelley (capstone) [<img src="/img/github.svg" alt="github" width="25" height="25" />](https://github.com/ErinKelley17)
- John Doan (capstone) [<img src="/img/github.svg" alt="github" width="25" height="25" />](https://github.com/johndoans)

- **Type:** Web application
- **Technologies in Use:**
- Python (FastAPI / Flask for API services)
- Kafka
- Docker & Docker Compose
- PostgreSQL
- Rerum Users (Auth0/Okta)
- HTML/CSS/JS (web portal)
- GitHub Actions (CI/CD)

### User Guide
- **Type:**
Web application + API + distributed microservices

The STL Data API simplifies access to St. Louis public data via a RESTful API and web portal. Register at the Rerum Users platform (Auth0/Okta) to get an API key. Browse datasets (e.g., income, health) on the web portal (URL TBD), filter and download as CSV, or use the API. Create and export charts in the “Visualize” tab or share interactive links. Save queries in “My Queries” for reuse. Subscribe to dataset updates via “Subscriptions” for notifications. Coordinators can manage data sources and announcements in the admin interface. Access API documentation at the docs endpoint (TBD). For support, use the portal’s feedback form.

## Technical Information
## User Guide (High-Level)

### Technical Overview
The STL Metro Data API makes it easy to explore St. Louis public data through both an API and a user-friendly web portal.

### Development Priorities
Users will be able to:

- Register through **Rerum Users** to obtain an API key
- Browse available datasets through the web portal
- Filter, search, and export datasets (CSV/JSON)
- Create interactive charts in the “Visualize” view
- Save queries to “My Queries” for future use
- Subscribe to dataset updates (email/webhook)
- Access developer documentation at the `/docs` endpoint (TBD)

Admin users can manage data sources, announcements, and refresh schedules.

## Architecture Diagram

![Software Architecture](architecture.png)

## Development Priorities

### **1. Core System**
- Finish implementing **CQRS write_service / read_service** separation
- Build a robust **Kafka-based event ingestion pipeline**
- Enable schema validation + dataset normalization workflows

### **2. Public API**
- Create a RESTful read API with stable URLs and strong documentation
- Add CSV/JSON downloads and query parameters
- Configure rate-limiting and API key authentication

### **3. Web Portal**
- Build lightweight, dependency-minimal pages using HTML/CSS/JS
- Implement dataset browsing and visualization tools via Chart.js

### **4. Infrastructure**
- Store all events in MongoDB (event store)
- Maintain query-optimized read models in PostgreSQL
- Add Docker Compose for local orchestration and onboarding

### **5. Administration**
- Coordinator interface for dataset management
- Dashboard for ingestion monitoring

### **6. Documentation**
- Comprehensive developer onboarding
- API documentation
- Security and threat model documentation
- Backlog hygiene and contributor guidelines

- Implement RESTful API with HTTPS endpoints for dataset access and aggregation.
- Develop static web portal using HTML/CSS/JavaScript for data browsing and visualization.
- Integrate Rerum Users (Auth0/Okta) for secure authentication and API key management.
- Set up RERUM repository for open data storage and cached aggregations.
- Configure MongoDB for private user data (profiles, saved queries).
- Create admin interface for coordinators to manage data sources and announcements.
- Enable dataset update subscriptions with email and webhook notifications.
- Support CSV and JSON data exports via API and web portal.
- Build visualization tools using Chart.js for interactive charts (bar, line, pie).
- Minimize dependencies to ensure durability and low maintenance.
- Provide comprehensive API and web portal documentation.

## Get Involved

If you would like to contribute to this project, please visit our [GitHub page](https://github.com/oss-slu/stl_metro_dat_api) to create your own issues or pull requests.
We welcome contributors of all experience levels!

To participate:

1. Visit the GitHub repository
2. Check our open issues and project board
3. Submit PRs, create new issues, or improve documentation
4. Join us in building the future of open data in St. Louis

**GitHub:** https://github.com/oss-slu/stl_metro_data_api
Binary file added docs/projects/stl_data_api/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading