-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Audit Proteomics Exchange (PRIDE) study metadata from the command line.
git clone https://github.com/LangeLab/PXAudit.git
cd PXAudit
uv sync
uv run pxaudit check PXD000001That one command fetches the dataset's metadata and file list, classifies every file, scores it on a FAIR ladder, and writes everything to a local SQLite database.
-
Check a single dataset:
pxaudit check PXD000001scores one accession and prints a summary. -
Audit a whole list:
pxaudit bulk-audit --input ids.txtruns through dozens or hundreds of accessions with a progress bar, then exports the results. -
Inspect file inventories:
pxaudit manifest PXD000001lists every file in a dataset with its category, size, and checksum. - Track over time: every audit writes to the same SQLite database, so you can query tier distributions, spot trends, and flag datasets that need re-scoring after a logic update.
- Work offline: API responses are cached locally. If the network goes down, PXAudit falls back to the cached data with a warning.
PRIDE API --> local cache --> file classifier --> tier engine --> SQLite DBPXAudit hits two PRIDE REST endpoints per accession (/projects and /files), caches the raw JSON, classifies every filename into one of nine FileClass types, then runs a deterministic Boolean checklist to assign two scores:
- Tier: a 7-level FAIR ladder from None through Diamond
- Quant Tier: a secondary axis from No Quant through Quant-Complete
Results are upserted into three SQLite tables (study, study_files, audit) so nothing gets lost when you re-audit an accession.
Current release: v0.3.0 (beta). Active development at github.com/LangeLab/PXAudit.
Documentation for PXAudit v0.3.0. Pages can be synced to the GitHub Wiki.
Getting started
Concepts
Contributing