TODO:
DEBAZIUM: ❌ Set up monitoring on your Postgres pg_replication_slots size ❌ If your transaction table is massive (billions of rows), you need to configure snapshot.mode carefully.
KAFKA: ❌ Handle schema evolution explicitly ❌ Handle deletes (ignored) ❌ Deduplicate records ❌ Enforce exactly-once semantics
- use Kafka Connect S3 Sink Connector
- use In-Memory Buffers
- the sudden exit data flush is not working.
group_id
- Multiple consumers with same group split the workload, If one dies, Kafka rebalances partitions
If your use case needs:
Audit trails
Slowly Changing Dimensions (SCD)
Time travel
Parquet: Columnar Storage Choice, Not ideal for row-based OLTP reads
############################################