Used to generate surrogate keys (e.g., customer_hash). This ensures keys are system-agnostic and handles composites in links.
Hubs capture raw business keys without attributes. Satellites add changeable details with timestamps for auditing changes over time. Links model many-to-many relationships without redundancy. The design supports scalability, as new satellites can be added for evolving attributes.
Extract via pandas, transform with hashing and deduplication, load with SQL inserts (using OR IGNORE for hubs/links to avoid duplicates, and always insert for satellites to track history).
This is a minimal implementation. For production, use a full DB like PostgreSQL, add error handling, and implement incremental loads (e.g., via staging tables). Expand with more entities as needed.
pip install -r requirements.txt
python etl.py