This repository is the community registry for genome annotation entries that power Annotrieve.
Contributors add project folders. Each folder contains:
- A
manifest.yamlfile — who produced the annotation and how (provider, pipeline, version). - An
annotations.tsvfile — one row per assembly: NCBI accession (GCA_…/GCF_…) and a stable HTTPS link to a GFF3 file (plain or gzipped).
Together, these files describe “this assembly, this annotation file,” in a form that can be checked automatically.
Annotation URLs: Pull-request validation downloads each linked file (up to 500 MiB per URL). Files above that size fail. Gzip-compressed GFF3 (.gff.gz) is strongly recommended — smaller downloads, faster checks, and less chance of exceeding the limit.
<project_name>/
manifest.yaml # Required metadata (see schema)
annotations.tsv # Header + one row per assembly (tab-separated)
- Exact TSV header:
schema/annotations.tsv.header - Manifest rules:
schema/manifest.schema.json - Copy-paste starter:
examples/sample_project/
See CONTRIBUTING.md for a step-by-step flow (fork → edit → pull request).
After your changes are merged here, the Genome Annotation Tracker reads this registry, turns each project’s manifest + TSV into formatted rows, and adds them to the shared community annotation table. Those rows are published on Annotrieve in periodic imports.
You (this repo) Downstream App
───────────────── ────────────────────────────── ───────────
manifest.yaml ──► genome-annotation-tracker ──► Annotrieve
annotations.tsv (community TSV)