Tulsa For You and Me Project

📖 Table of Contents

📜 Disclaimer
🗂️ Business Context
❗ Business Problem
🗺️ Project Overview
🏆 Business Outcome
🏗️ Repository Structure
🎥 Video Presentation

📜 Disclaimer

This project is structured as a lightweight, localized proof-of-concept designed to rapidly demonstrate a functional data warehouse schema for my Tulsa For You and Me project. To optimize development velocity and eliminate cloud infrastructure overhead, pipeline leverages Python pandas for fast, in-memory transformations and loads directly into a local PostgreSQL warehouse. Data validation is managed via standard Python data profiling (using Great Expectations pre-load) paired with custom, in-house referential integrity assertions during runtime.

📖 Table Of Contents

🗂️ Business Context

You’ve just joined the tech team at Tulsa For You and Me. The team is working to standardize job and wage data to support multiple workforce programs across Tulsa. Your first assignment is to prototype a simple data warehouse that can power future dashboards and analysis on Tulsa’s workforce and labor market trends.

📖 Table Of Contents

❗ Business Problem

The Tulsa For You and Me initiative is tasked with driving economic mobility and expanding workforce programs across the city of Tulsa. However, the organization faces a critical operational bottleneck. Its foundational labor market, occupation, and demographic data are heavily fragmented, unstandardized, and siloed across various public and municipal source files.

Because no data infrastructure currently exists, the technical team cannot support regional dashboards, track wage trends, or identify skill gaps. To solve this, a Data Warehouse Engineer must design and execute a complete, end-to-end data ecosystem from scratch, moving from identifying sources to final analytics.

📖 Table Of Contents

🗺️ Project Overview

My solution provides an end-to-end analytics solution built from scratch to centralize highly fragmented public datasets. The pipeline standardizes disparate source formats into a clean, read-optimized warehouse ready for downstream enterprise reporting. The data lifecycle moves through a structured, six-stage pipeline execution:

Extractions
Extraction Validations
Transformations
Transformation Validations
Warehouse Loading
Loading Validations

📖 Table Of Contents

🏆 Business Outcome

By centralizing data from scratch and resolving the fragmentation between O*NET, GeoCorr, and Census datasets, this proof-of-concept shows how Tulsa For You and Me could successfully transition from an operational standstill to a data-driven workforce organization.

📖 Table Of Contents

🏗️ Repository Structure

.
├── artifacts <----------- Any outputs produced during runtime get saved here
│   ├── census.json
│   ├── clean_census.json
│   ├── clean_geocorr.csv
│   ├── clean_job_zones.xlsx
│   ├── clean_occupation_data.xlsx
│   ├── geocorr.csv
│   ├── job_zones.xlsx
│   └── occupation_data.xlsx
|
├── assets <----------- Stores external, non-code dependencies
│   └─── readme
│       └── images
│           └── rainbow_bar.png
|
├── code <----------- Contains all of the pipeline's source code
│   ├── libs
│   │   ├── extractions <----------- Source code related to extractions
│   │   │   ├── census.py
│   │   │   ├── geocorr.py
│   │   │   └──  onet.py
|   │   │
│   │   ├── loaders <----------- Source code related to loading
│   │   │   └──  postgres.py
│   │   │
│   │   ├── transformations <----------- Source code related to transformations
│   │   │   ├── census.py
│   │   │   ├── geocorr.py
│   │   │   └── onet.py
│   │   │
│   │   ├── utilities <----------- Source code related to helper and common misc. funcs
│   │   │   ├── configs.py
│   │   │   ├── env.py
│   │   │   ├── extractions.py
|   |   |   ├── transformations.py
│   │   │   ├── file_system.py
│   │   │   ├── __init__.py
│   │   │   └── postgres_helper.py
|   |   |
│   │   └── validations <----------- Source code related to data validations
│   │       ├── census.py
│   │       ├── database.py
│   │       ├── geocorr.py
│   │       └── onet.py
|   │
│   ├── main.py <----------- *** The pipeline's entry point ***
|   |
│   └── setup <----------- Source code related to Great Expectations/our data validations
│         ├── expectations.py
│         └── gx_setup.py
|
|── configs <----------- Configurations directory
│   ├── general.toml
│   ├── gx
│   ├── gx.toml
|   └── .env <----------- A `.env` FILE MUST BE CREATED LOCALLY HERE TO STORE CREDENTIALS AS INSTRUCTED IN `./docs/3 - Extractions.md` FOR PROPER RUNTIME
│
├── docs  <----------- Documentation covering the pipeline/program/source code
│   ├── 1 - Architecture.md
│   ├── 2 - Sources.md
│   ├── 3 - Extractions.md
│   ├── 4 - Extraction Validations.md
│   ├── 5 - Transformations.md
│   ├── 6 - Transformation Validations.md
│   ├── 7 - Warehouse Schema.md
│   ├── 8 - Warehouse Loading.md
│   ├── 9 - Loading Validations.md
│   └── warehouse_star_schema_erd.html
│
├── management <----------- Contains project management and lifecycle materials
│   └── DWE Candidate Technical Activity.pdf
|
├── sql_queries  <----------- 3 SQL queries and their results
│   ├── query_1.sql
│   ├── query_1_results.png
│   ├── query_2.sql
│   ├── query_2_results.png
│   ├── query_3.sql
│   └── query_3_results.png
|
├── tests <-----------  Active test suites were bypassed for this proof-of-concept but included to maintain my standard project directory layout.
│   └── .gitkeep
|
├── pyproject.toml
├── README.md
└── uv.lock

📖 Table Of Contents

🎥 Video Presentation

Click here to watch my presentation

📖 Table Of Contents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tulsa For You and Me Project

📖 Table of Contents

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
artifacts		artifacts
assets/readme/images		assets/readme/images
code		code
configs		configs
docs		docs
management		management
sql_queries		sql_queries
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Tulsa For You and Me Project

📖 Table of Contents

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages