This project provides a robust, containerized Apache Airflow environment, integrated with Keycloak for secure authentication and PostgreSQL as its metadata database. Designed for easy deployment and management, this stack is ideal for orchestrating data pipelines and workflows with enterprise-grade authentication.
- Apache Airflow: A platform to programmatically author, schedule, and monitor workflows.
- Keycloak Integration: Secure user authentication and authorization via Keycloak Identity and Access Management.
- PostgreSQL Backend: Reliable and scalable database for Airflow metadata.
- Dockerized Environment: All components are containerized using Docker for consistency and isolation.
- Docker Compose Orchestration: Easily manage and run the entire stack with a single command.
- Automated Setup Scripts: Scripts for initial Airflow setup and Keycloak client creation.
- Apache Airflow
- Keycloak
- PostgreSQL
- Docker
- Docker Compose
- Bash Scripting
.
βββ docker-compose.yml # Defines the services for the Airflow stack
βββ .env # Environment variables for the stack (e.g., database credentials)
βββ README.md # Project documentation (this file!)
βββ airflow # Airflow service configuration and DAGs
β βββ Dockerfile # Builds the custom Airflow image
β βββ requirements.txt # Python dependencies for Airflow and DAGs
β βββ config/ # Airflow configuration files
β β βββ webserver_config.py # Web server custom configuration (e.g., OAuth setup)
β βββ dags/ # Your Airflow Directed Acyclic Graphs (DAGs)
β β βββ example_dag.py # Example DAG to get you started
β βββ logs/ # Airflow runtime logs
β βββ plugins/ # Custom Airflow plugins, operators, hooks
βββ configs # Global configuration files for the stack
β βββ airflow.cfg # Main Airflow configuration file
βββ keycloak # Keycloak service configuration
β βββ Dockerfile # Builds the custom Keycloak image
β βββ realm-export.json # Keycloak realm configuration for initial setup
βββ postgres # PostgreSQL service configuration
β βββ init.sql # SQL script for initial database setup
βββ scripts # Helper scripts for setup and management
βββ init_airflow.sh # Initializes Airflow (e.g., database, admin user)
βββ create_keycloak_client.sh # Automates Keycloak client creation for Airflow
Follow these steps to get your Airflow stack up and running:
-
Clone the Repository:
git clone https://github.com/AkashBhadana/Airflow-Stack.git cd Airflow-Stack -
Environment Variables: Create a
.envfile in the root directory (if not already present) and populate it with necessary environment variables, such as database credentials or Keycloak client secrets. A.env.examplemight be provided (not currently, but good practice). -
Build and Run the Stack:
docker-compose up --build -d
The
-dflag runs the containers in detached mode. -
Access Airflow UI: Once all services are up and running, access the Airflow UI in your web browser: http://localhost:8080
-
Access Keycloak Admin Console: The Keycloak admin console will be available at: http://localhost:8081 (You may need to refer to Keycloak setup documentation for initial admin credentials.)
- Airflow: Develop and deploy your data pipelines by adding DAG files to the
airflow/dagsdirectory. Manage and monitor them via the Airflow UI. - Keycloak: Use the Keycloak admin console to manage users, roles, and clients for authenticating into Airflow.
- Ensure Docker and Docker Compose are installed on your system.
- The first user will be auto-created in Airflow upon their initial OAuth login via Keycloak.
- Remember to secure your
.envfile and any sensitive configurations.
Feel free to fork this repository, open issues, or submit pull requests to improve this Airflow stack.
Built with β€οΈ for robust data orchestration.