diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9449ece..5edfd1a 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,12 +1,12 @@ # Contributing to the registry -This guide is for **anyone** adding or changing annotation entries—including people who prefer not to install developer tools. You can contribute **only through GitHub in a browser**, or **validate on your machine** using our ready-made container. +This guide is for **anyone** adding or changing annotation entries in Annotrieve. You can contribute directly **through GitHub in a browser** and trigger the automated validation, or **validate on your machine** using our ready-made container and then open a PR. --- ## What you are adding -Each **project** is a folder with two files: +Each contributors adds their **project** as a folder with two files: | File | Purpose | |------|--------| @@ -20,7 +20,7 @@ Rules that matter for everyone: --- -## Path 1 — Contribute with a fork (works in the browser) +## Contribute with a fork (works in the browser) This is the usual way: you do **not** need direct write access to this repository. @@ -39,26 +39,26 @@ You can push more commits to the same PR; checks run again each time. **Please keep each PR to edits in a single `annotations.tsv` file** (one TSV changed per PR). That keeps review and CI predictable. +When your PR is ready and all checks pass, we will review the changes and merge it. After merging, your annotation entries will be available in the next update of Annotrieve (usually within a week). + --- -## What the PR check does (high level) +### What the PR check does When you open or update a pull request, a workflow runs in a **pre-built environment** (a Docker image we publish to GitHub Container Registry). In simple terms it: 1. **Compares** your branch to the branch you are merging into, so only **new or changed rows** in `annotations.tsv` are fully re-checked (older rows are not re-downloaded unless the file changed). 2. Checks **`manifest.yaml`** for every project folder that your PR touches. 3. For each **new** TSV row, checks that: - - the accession looks like a real NCBI assembly and **exists in NCBI** (using the official NCBI `datasets` tool in bulk, not one request per row); - - the **URL works** and the downloaded data looks like **GFF3** with the fields Annotrieve expects; - - the file can go through the same **tabix-style** steps Annotrieve uses (so we know it is indexable in practice). - -If something fails, the PR will show as failed until the data is fixed—but you always get the summary and line-level hints to guide you. + - the accession looks like a real NCBI assembly and **exists in NCBI** using NCBI `datasets` tool; + - the **URL exists** and the downloaded data is in **GFF3** format; + - the annotation file can be sorted and tabindexed as in the main **Annotrieve pipeline**. -Maintainers: the checker image is built by [`.github/workflows/publish-ci-validator.yml`](.github/workflows/publish-ci-validator.yml). The PR workflow pulls the image named in [`.github/workflows/validate-pr.yml`](.github/workflows/validate-pr.yml); keep those in sync if you rename the package or registry. +If something fails, the PR will show as failed until the data is fixed, while you always get the summary and line-level hints to guide your fixes. --- -## Path 2 — Check your TSV locally (Docker, recommended for a “full” dry run) +### Check your TSV locally (dry run) If you have **[Docker](https://docs.docker.com/get-docker/)** installed, you can run **the same** validator we use in CI **without** installing Python, tabix, or the NCBI CLI on your laptop. @@ -100,7 +100,7 @@ If the image is **private**, log in once with `docker login ghcr.io` using a Git --- -## Path 3 — Check locally without Docker (advanced) +### Check locally without Docker (advanced) Install **Python 3.11+**, **tabix/bgzip** (htslib), and the **[NCBI datasets CLI](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install/)**, then: @@ -117,7 +117,7 @@ Use `--datasets-binary /path/to/datasets` if `datasets` is not on your `PATH`. R --- -## Optional tuning (for developers) +### Optional tuning (for developers) These environment variables only affect the validator when set (defaults are fine for most contributors): diff --git a/README.md b/README.md index b0af0a0..5a9f249 100644 --- a/README.md +++ b/README.md @@ -1,33 +1,17 @@ # Annotrieve community annotations registry -This repository is the **community registry** for genome annotation entries that power **Annotrieve**. +This repository is the **community registry** for genome annotation entries that power [Annotrieve](https://genome.crg.es/annotrieve/). ## What this repo is for -Contributors add **small project folders**. Each folder contains: +Contributors add a **project folders**. Each folder contains: - A **`manifest.yaml`** file — who produced the annotation and how (provider, pipeline, version). - An **`annotations.tsv`** file — one row per assembly: NCBI accession (`GCA_…` / `GCF_…`) and a stable **HTTPS link** to a **GFF3** file (plain or gzipped). Together, these files describe “this assembly, this annotation file,” in a form that can be checked automatically. -## How it fits in the larger system - -```text -You (this repo) Downstream App -───────────────── ─────────────────────────────── ─────────── -manifest.yaml ──┐ -annotations.tsv ──┼──► genome-annotation-tracker ──► community TSV -(project folders) │ (merges + normalizes rows) ──► Annotrieve - └ github.com/guigolab/ - genome-annotation-tracker -``` - -After your changes are **merged here**, the **[Genome Annotation Tracker](https://github.com/guigolab/genome-annotation-tracker)** reads this registry, turns each project’s manifest + TSV into **formatted rows** in a shared **community annotation table**, and that table is what **[Annotrieve](https://genome.crg.eu/annotrieve)** uses. - -So: **this repo = curated source of truth**; the tracker = **batch merger / formatter**; Annotrieve = **what researchers use in the browser**. - -## Repository layout +### These are the files you add or edit when you contribute: ```text / @@ -39,6 +23,19 @@ So: **this repo = curated source of truth**; the tracker = **batch merger / form - Manifest rules: [`schema/manifest.schema.json`](schema/manifest.schema.json) - Copy-paste starter: [`examples/sample_project/`](examples/sample_project/) -## Contributing +See **[`CONTRIBUTING.md`](CONTRIBUTING.md)** for a step-by-step flow (fork → edit → pull request). + +## How it fits in the larger system + +After your changes are **merged here**, the **[Genome Annotation Tracker](https://github.com/guigolab/genome-annotation-tracker)** reads this registry, turns each project’s manifest + TSV into a **formatted rows** and adds them to the shared **community annotation table**. The annotations present in the table content are pubblished on **[Annotrieve](https://genome.crg.eu/annotrieve)**. + +```text +You (this repo) Downstream App +───────────────── ─────────────────────────────── ─────────── +manifest.yaml ──┐ +annotations.tsv ──┼──► genome-annotation-tracker ──► community TSV +(project folders) │ (merges + normalizes rows) ──► Annotrieve + └ github.com/guigolab/ + genome-annotation-tracker +``` -See **[`CONTRIBUTING.md`](CONTRIBUTING.md)** for a step-by-step flow (fork → edit → pull request), what the automated checks do in plain language, and how to run the same checks **on your computer** (including using the **published Docker image** so you do not have to install Python or NCBI tools yourself). diff --git a/fab/annotations.tsv b/fab/annotations.tsv deleted file mode 100644 index 5f719f7..0000000 --- a/fab/annotations.tsv +++ /dev/null @@ -1,2 +0,0 @@ -assembly_accession access_url -GCF_022695815.1 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/022/695/815/GCF_022695815.1_EurotioJF033F_1/GCF_022695815.1_EurotioJF033F_1_genomic.gff.gz diff --git a/fab/manifest.yaml b/fab/manifest.yaml deleted file mode 100644 index 70774f5..0000000 --- a/fab/manifest.yaml +++ /dev/null @@ -1,9 +0,0 @@ -# Copy this folder to / and replace values before opening a PR. -provider_name: "Fab Lab" -pipeline_method: "bho" -pipeline_version: "0.1.0" -project_display_name: "" -description: "a test" -contact_email: "pi123@example.org" -license: "CC-BY-4.0" -homepage_url: "https://genome.crg.es/annotrieve/annotations/" diff --git a/examples/sample_project/annotations.tsv b/sample_project/annotations.tsv similarity index 100% rename from examples/sample_project/annotations.tsv rename to sample_project/annotations.tsv diff --git a/examples/sample_project/manifest.yaml b/sample_project/manifest.yaml similarity index 100% rename from examples/sample_project/manifest.yaml rename to sample_project/manifest.yaml