Skip to content

anudeept501/db2-bq-cdc-poc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DB2 → Pub/Sub → BigQuery CDC POC (GCP + Python)

This repo contains a hands-on change data capture (CDC) style pipeline from DB2 on GCE to BigQuery using Python and Google Pub/Sub, plus a batch CDC + reconciliation script.

It simulates a mainframe modernization scenario where DB2 changes are replicated to BigQuery in near real-time.


🏗 Architecture

poc arch diagram

flowchart LR DB2[DB2 on GCE VM] PUBLISH[CDC Publisher] TOPIC[Pub/Sub Topic] SUB[Subscription] SUBSCRIBER[CDC Subscriber + MERGE] STAGE[Staging Table] MAIN[Main CDC Table]

DB2 --> PUBLISH
PUBLISH --> TOPIC
TOPIC --> SUB
SUB --> SUBSCRIBER
SUBSCRIBER --> STAGE
SUBSCRIBER --> MAIN

DB2 -->|SELECT ... WHERE UPDATED_TS > last_sync_ts| PUBLISH
PUBLISH -->|JSON events| TOPIC
TOPIC --> SUB
SUB -->|pull messages| SUBSCRIBER
SUBSCRIBER -->|insert_rows_json| STAGE
SUBSCRIBER -->|MERGE & TRUNCATE| MAIN

About

Near real-time DB2 → Pub/Sub → BigQuery CDC pipeline on GCP (Python POC)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages