This repo contains a hands-on change data capture (CDC) style pipeline from DB2 on GCE to BigQuery using Python and Google Pub/Sub, plus a batch CDC + reconciliation script.
It simulates a mainframe modernization scenario where DB2 changes are replicated to BigQuery in near real-time.
flowchart LR DB2[DB2 on GCE VM] PUBLISH[CDC Publisher] TOPIC[Pub/Sub Topic] SUB[Subscription] SUBSCRIBER[CDC Subscriber + MERGE] STAGE[Staging Table] MAIN[Main CDC Table]
DB2 --> PUBLISH
PUBLISH --> TOPIC
TOPIC --> SUB
SUB --> SUBSCRIBER
SUBSCRIBER --> STAGE
SUBSCRIBER --> MAIN
DB2 -->|SELECT ... WHERE UPDATED_TS > last_sync_ts| PUBLISH
PUBLISH -->|JSON events| TOPIC
TOPIC --> SUB
SUB -->|pull messages| SUBSCRIBER
SUBSCRIBER -->|insert_rows_json| STAGE
SUBSCRIBER -->|MERGE & TRUNCATE| MAIN