Build a pipeline that extracts data from multiple sources (e.g., API, CSV, or database), transforms them with dbt (e.g., cleaning, enrichment), and loads it into a data warehouse (e.g., PostgreSQL, Microsoft SQL Server, Azure Data Lake, MongoDB etc.).
Pipeline Structure
Pipeline Structure
- Extract: Use Airflow to pull raw data from a source (e.g., API or database) and load it into a raw/staging schema in your data warehouse.
- Transform: Use dbt to run transformations on staged data into analytics-ready models.
- Load: dbt transformations typically are the load into the final tables
Steps for Airflow Configuration:
- Move into folder "etl"
- Make sure you already hast Database URL where to save airflow details (such as users, roles, dags)
- Create a .env file where AIRFLOW__DATABASE__SQL_ALCHEMY_CONN and USER detail are saved
- Make the file "init_airflow.sh" (to init and create new user havinf access to airflow web UI) executable by running the command: chmod +x init_airflow.sh
- To init the airflow database, run: ./init_airflow.sh 0
- To create a new user for airflow web UI, run: ./init_airflow.sh 1 and to list all users with: airflow users list
Run Airflow:
- Run Airflow Web UI: airflow webserver or airflow webserver --port 8080
An open-source business intelligence application called Metabase had been developed to make data visualization and analysis simple for users without requiring a high level of technical expertise. Metabase is used in the "bi" folder to generate interactive dashboards and reports that offer insightful information and facilitate data-driven decision-making throughout the company. Its intuitive interface facilitates exchanging findings, searching databases, and real-time monitoring of important parameters.
The "bi" folder contains steps and instructions for installing and running Metabase locally.