Test to check validation works properly#16
Conversation
Registry validation
Inline annotations on each failing row — open Files changed for details. |
| @@ -0,0 +1,8 @@ | |||
| assembly_accession access_url | |||
| GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz | |||
There was a problem hiding this comment.
🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate assembly_accession
GCA_048338725.1appears more than once; keep at most one row per assembly.
| GCA_044906185.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| NOT_A_VALID_ACCESSION https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| GCA_963930625.1 https://example.invalid/nonexistent.gff3.gz | ||
| GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_046128825.1/ensembl/geneset/2025_08/genes.gff3.gz |
There was a problem hiding this comment.
🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate assembly_accession
GCA_048338725.1appears more than once; keep at most one row per assembly.
| @@ -0,0 +1,8 @@ | |||
| assembly_accession access_url | |||
| GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz | |||
There was a problem hiding this comment.
🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate access_url
https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gzappears more than once; each row must use a distinct URL.
| @@ -0,0 +1,8 @@ | |||
| assembly_accession access_url | |||
| GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz | |||
| GCA_044906185.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz | |||
There was a problem hiding this comment.
🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate access_url
https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gzappears more than once; each row must use a distinct URL.
| assembly_accession access_url | ||
| GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| GCA_044906185.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| NOT_A_VALID_ACCESSION https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz |
There was a problem hiding this comment.
🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate access_url
https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gzappears more than once; each row must use a distinct URL.
| @@ -0,0 +1,8 @@ | |||
| assembly_accession access_url | |||
| GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz | |||
| GCA_044906185.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz | |||
There was a problem hiding this comment.
🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate annotation file (MD5)
- checksum
09a3ba67712292763cd684a8f5119fc4appears more than once; each row must refer to a distinct annotation file.
| NOT_A_VALID_ACCESSION https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| GCA_963930625.1 https://example.invalid/nonexistent.gff3.gz | ||
| GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_046128825.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| GCA_040869115.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz |
There was a problem hiding this comment.
🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate annotation file (MD5)
- checksum
09a3ba67712292763cd684a8f5119fc4appears more than once; each row must refer to a distinct annotation file.
| assembly_accession access_url | ||
| GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| GCA_044906185.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| NOT_A_VALID_ACCESSION https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz |
There was a problem hiding this comment.
🚫 [annotrieve-validator] reported by reviewdog 🐶NOT_A_VALID_ACCESSION
- assembly_accession format invalid (need GCA_/GCF_…): 'NOT_A_VALID_ACCESSION'
| GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| GCA_044906185.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| NOT_A_VALID_ACCESSION https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| GCA_963930625.1 https://example.invalid/nonexistent.gff3.gz |
There was a problem hiding this comment.
🚫 [annotrieve-validator] reported by reviewdog 🐶GCA_963930625.1
- download failed: HTTPSConnectionPool(host='example.invalid', port=443): Max retries exceeded with url: /nonexistent.gff3.gz (Caused by NameResolutionError("HTTPSConnection(host='example.invalid', port=443): Failed to resolve 'example.invalid' ([Errno -2] Name or service not known)"))
| GCA_963930625.1 https://example.invalid/nonexistent.gff3.gz | ||
| GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_046128825.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| GCA_040869115.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz | ||
| GCA_000820885.1 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/820/885/GCA_000820885.1_KLDO_01/GCA_000820885.1_KLDO_01_genomic.gff.gz |
test_project/annotations.tsv— intentional test rowsFixture TSV for exercising registry PR validation.
assembly_accessionGCA_048338725.1,GCA_044906185.1ensembl_annotations.tsv)NOT_A_VALID_ACCESSIONGCA_/GCF_)GCA_963930625.1example.invalid)GCA_048338725.1(again)GCA_040869115.1access_url)GCA_000820885.1Tabix-failure row (row 7):
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/820/885/GCA_000820885.1_KLDO_01/GCA_000820885.1_KLDO_01_genomic.gff.gz