This project provides a fully local, containerized ETL environment using modern data stack tools.
It includes:
- Airflow for orchestration
- dbt for transformations
- PostgreSQL as a warehouse simulator
- DuckDB (Python) for lightweight processing
- MinIO as an S3-compatible data lake
Start the full stack with:
docker-compose up -dThis builds the Airflow image (including Python dependencies) and launches all services.
- UI available at: http://localhost:9080
- To retrieve the autogenerated login credentials:
docker exec <container-name> bash -c "cat /opt/airflow/simple_auth_manager_passwords.json.generated"Use the displayed username/password to log in.
- MinIO console: http://localhost:9000
- Upload CSV files to the
csvbucket.
You can use either the UI or the AWS CLI.
Configure the CLI using the credentials in docker-compose.yml:
aws configureUpload files to the expected folder structure:
aws --endpoint-url http://localhost:9000 s3 cp <file>.csv s3://csv/leads/<YYYY>/<MM>/<DD>/<file>.csv
aws --endpoint-url http://localhost:9000 s3 cp <file>.csv s3://csv/sales/<YYYY>/<MM>/<DD>/<file>.csv- Adminer UI: http://localhost:8080
- Database credentials are defined in the
postgresservice insidedocker-compose.yml.