Skip to content

EOSC-Data-Commons/toolmeta-harvester

Repository files navigation

🚧 Work in Progress
This project is currently under active development.
Features may change, and the API may not be stable yet.
Contributions and feedback are welcome!

Roadmap

🚧 Phase 1 — Foundation Galaxy focused (Current)

  • Project scaffolding and initial architecture
  • Interface to Galaxy ToolShed API
  • Interface to WorkflowHub API
  • Parsing Galaxy workflows and enrich with ToolShed data
  • Data models for Galaxy tools and workflows
  • Generalized data model for artifacts and contracts
  • Initial data harvesting and storage from WorkflowHub
  • Deployment to Warehouse
  • Basic documentation (README, setup, usage)
  • Basic tests and CI pipeline
  • Create initial embedding pipeline

Installation and Usage

Prerequisites

  • Python 3.12+
  • Docker
  • uv

Credentials

Setup config/.secrets.toml with Github API token

Setup

make install

Boots Postgres Docker container and installs dependencies

make run

Runs a default pipeline that harvests data from WorkflowHub, stores it in the db.

About

ETL pipelines for harvesting tool metadata and populate database

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors