Skip to content

guidok91/aviation-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aviation analytics with DuckDB, dbt and dlt using AirLabs data

workflow

Aviation analytics project with DuckDB, dbt and dlt using the data from AirLabs API.

Data Architecture

data architecture

Datasets are ingested as snapshots from the source API.
When loading to the curated tables, they are processed with a merge strategy (using a unique key per dataset).
This preserves the latest version of each record to keep it simple on the querying side.

Running instructions

Run make help to see available commands together with their description.

Spin up Docker containers

Build and spin up Docker containers needed for the app:

  • make docker-up

Ingest source data from AirLabs REST API to DuckDB using dlt

Get into the dlt container:

  • make docker-it-dlt

For this step we first need to generate an AirLabs API key (see how to on their website), and set the environment variable AIRLABS_API_KEY. Then run:

  • make dlt-ingest-source-data

Run dbt models to transform and curate the data

Exit the dlt container and get into the dbt one by running make docker-it-dbt. Then:

  • make dbt-deps
  • make dbt-run

Data exploration with the DuckDB UI

Once the models have been run and the data is ready, you can start exploring the data.

Run make duckdb-ui to lauch the DuckDB UI and access it via http://localhost:4213.

Example queries:

Countries with the highest number of airports

SELECT
    country_code,
    COUNT(*)
FROM
    curated.airports
GROUP BY
    country_code
ORDER BY
    count(*) DESC
LIMIT 10;

Current number of flights by status

SELECT
    flight_status,
    COUNT(*)
FROM
    curated.flight_positions
WHERE
    processed_timestamp = (SELECT MAX(processed_timestamp) FROM curated.flight_positions)
GROUP BY
    ALL
ORDER BY
    COUNT(*) DESC;

data exploration

Explore dbt project docs

dbt provides auto-generated documentation for the project which can be accessed on https://guidok91.github.io/aviation-analytics/.

Dependency management

Dependabot is configured to periodically upgrade repo dependencies. See dependabot.yml.

CI/CD

A Github Actions CI/CD pipeline is defined here and can be seen here.

Note that the AIRLABS_API_KEY is provided as a Github repository secret to be used in the CI/CD pipeline.

About

Aviation analytics with DuckDB, dbt and dlt using AirLabs data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors