Skip to content

Test to check validation works properly#16

Merged
emiliorighi merged 1 commit into
guigolab:masterfrom
emiliorighi:master
May 16, 2026
Merged

Test to check validation works properly#16
emiliorighi merged 1 commit into
guigolab:masterfrom
emiliorighi:master

Conversation

@emiliorighi
Copy link
Copy Markdown
Collaborator

test_project/annotations.tsv — intentional test rows

Fixture TSV for exercising registry PR validation.

Row assembly_accession Purpose
1–2 GCA_048338725.1, GCA_044906185.1 Valid Ensembl pig annotations (from ensembl_annotations.tsv)
3 NOT_A_VALID_ACCESSION Invalid accession format (not GCA_/GCF_)
4 GCA_963930625.1 Valid accession, unreachable URL (example.invalid)
5 GCA_048338725.1 (again) Duplicate accession (row 1 repeated with a different URL)
6 GCA_040869115.1 Duplicate URL (reuses row 2’s access_url)
7 GCA_000820885.1 NCBI genomic GFF that fails tabix indexing

Tabix-failure row (row 7): https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/820/885/GCA_000820885.1_KLDO_01/GCA_000820885.1_KLDO_01_genomic.gff.gz

@github-actions
Copy link
Copy Markdown
Contributor

Registry validation

  • New rows: 4 valid, 3 invalid
  • Manifest issues: 0
  • Duplicate assemblies (TSV files): 1
  • Duplicate URLs (TSV files): 1
  • Duplicate annotation files (MD5, TSV files): 1
  • Duplicate files vs registry index: 0
  • TSV header/parse errors: 0

Inline annotations on each failing row — open Files changed for details.
Updated on every push.

@@ -0,0 +1,8 @@
assembly_accession access_url
GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate assembly_accession

  • GCA_048338725.1 appears more than once; keep at most one row per assembly.

GCA_044906185.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz
NOT_A_VALID_ACCESSION https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz
GCA_963930625.1 https://example.invalid/nonexistent.gff3.gz
GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_046128825.1/ensembl/geneset/2025_08/genes.gff3.gz
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate assembly_accession

  • GCA_048338725.1 appears more than once; keep at most one row per assembly.

@@ -0,0 +1,8 @@
assembly_accession access_url
GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate access_url

  • https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz appears more than once; each row must use a distinct URL.

@@ -0,0 +1,8 @@
assembly_accession access_url
GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz
GCA_044906185.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate access_url

  • https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz appears more than once; each row must use a distinct URL.

assembly_accession access_url
GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz
GCA_044906185.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz
NOT_A_VALID_ACCESSION https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate access_url

  • https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz appears more than once; each row must use a distinct URL.

@@ -0,0 +1,8 @@
assembly_accession access_url
GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz
GCA_044906185.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate annotation file (MD5)

  • checksum 09a3ba67712292763cd684a8f5119fc4 appears more than once; each row must refer to a distinct annotation file.

NOT_A_VALID_ACCESSION https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz
GCA_963930625.1 https://example.invalid/nonexistent.gff3.gz
GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_046128825.1/ensembl/geneset/2025_08/genes.gff3.gz
GCA_040869115.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [annotrieve-validator] reported by reviewdog 🐶
Duplicate annotation file (MD5)

  • checksum 09a3ba67712292763cd684a8f5119fc4 appears more than once; each row must refer to a distinct annotation file.

assembly_accession access_url
GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz
GCA_044906185.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz
NOT_A_VALID_ACCESSION https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [annotrieve-validator] reported by reviewdog 🐶
NOT_A_VALID_ACCESSION

  • assembly_accession format invalid (need GCA_/GCF_…): 'NOT_A_VALID_ACCESSION'

GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz
GCA_044906185.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz
NOT_A_VALID_ACCESSION https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_048338725.1/ensembl/geneset/2025_08/genes.gff3.gz
GCA_963930625.1 https://example.invalid/nonexistent.gff3.gz
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [annotrieve-validator] reported by reviewdog 🐶
GCA_963930625.1

  • download failed: HTTPSConnectionPool(host='example.invalid', port=443): Max retries exceeded with url: /nonexistent.gff3.gz (Caused by NameResolutionError("HTTPSConnection(host='example.invalid', port=443): Failed to resolve 'example.invalid' ([Errno -2] Name or service not known)"))

GCA_963930625.1 https://example.invalid/nonexistent.gff3.gz
GCA_048338725.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_046128825.1/ensembl/geneset/2025_08/genes.gff3.gz
GCA_040869115.1 https://ftp.ebi.ac.uk/pub/ensemblorganisms/Sus_scrofa/GCA_044906185.1/ensembl/geneset/2025_08/genes.gff3.gz
GCA_000820885.1 https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/820/885/GCA_000820885.1_KLDO_01/GCA_000820885.1_KLDO_01_genomic.gff.gz
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [annotrieve-validator] reported by reviewdog 🐶
GCA_000820885.1

  • tabix pipeline: [E::hts_idx_push] Invalid record on sequence test #4: end 276483 < begin 276510
    tbx_index_build failed: /tmp/arv_h_6kpfbw0v/pipe.gff.gz

@emiliorighi emiliorighi merged commit 410c06a into guigolab:master May 16, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant