STAN — Standardized proteomic Throughput ANalyzer

Know your instrument.

Author: Brett Stanley Phinney, UC Davis Proteomics Core
License: STAN Academic License (free for academic/non-profit use; see LICENSE)

📖 New to STAN? Start with the User Guide for installation, daily use, dashboard tour, and troubleshooting.

STAN watches the directory where your instrument writes raw files, runs a standardized DIA-NN or Sage search on every HeLa QC injection, scores the result against your instrument's historical cohort, and drops a HOLD flag if a run fails — before your sample queue continues. A local web dashboard tracks everything. Community benchmark submission is opt-in.

What it is

Proteomics core facilities need continuous, automated QC. The existing options are either vendor-locked (Bruker ProteoScape), expensive, or require bespoke scripting. STAN is an open-source alternative that runs on the instrument workstation, reads Bruker .d directories and Thermo .raw files natively, and calls DIA-NN or Sage as subprocesses — no proprietary middleware.

The approach is watcher-based: a daemon monitors your acquisition directory, waits for each file to finish writing (vendor-specific stability detection), identifies DIA vs. DDA acquisition mode from the raw file itself, dispatches the appropriate search engine, extracts a fixed set of QC metrics, and gates the result against configurable thresholds. Everything lands in a local SQLite database and is visible on a single-page dashboard. SLURM is supported for labs that want centralized HPC compute.

The community benchmark uses a frozen FASTA + spectral library so that precursor and PSM counts are comparable across labs. Submission is aggregate metrics only — no raw files, no sample metadata leave your building. A public leaderboard is hosted at community.stan-proteomics.org, faceted by instrument family, gradient length (SPD), and injection amount.

Quick install — pick your mode

If you...	Use mode	Estimated time	Install doc
Have one instrument PC and want auto-QC on it	A — Local (Windows)	5 min	Below
Have a beefy Windows box that should process raws from multiple instruments	B — WSL2	20 min	`docs/INSTALL_MODE_B_WSL.md`
Have HPC access and want centralized compute	C — SLURM	1–3 hours	`docs/INSTALL_MODE_C_HPC.md`

Mode A — instrument PC (Windows, recommended)

Download install-stan.bat, right-click → Save As, double-click. The script installs Python if needed, clones STAN from GitHub, and auto-installs DIA-NN and Sage from their official release pages.

To update: run update-stan.bat. It self-updates from GitHub and restarts the watcher.

After install, run stan setup (a 6-question wizard) then stan watch to start the daemon.

Mode A — Mac / Linux / advanced

git clone https://github.com/bsphinney/stan.git
cd stan
pip install -e ".[dev]"

Install DIA-NN and Sage separately and ensure they are on your PATH. Then:

stan init      # creates ~/.stan/ with config templates
stan setup     # 6-question wizard
stan watch     # start the watcher daemon
stan dashboard # serve dashboard at http://localhost:8421

The dashboard

Open http://localhost:8421 after stan dashboard. Nine tabs:

This Week's QCs — gauge, weekly table, or metric-matrix view of recent HeLa runs. IPS badge front and center.
QC History — every run, sortable and filterable. Click a row for the full modal: metric breakdown, gate verdicts, PEG lollipop chart, diaPASEF drift cloud (Bruker), 4DFF Ion Cloud (Bruker, optional).
Trends — longitudinal sparklines for IPS, precursor/PSM count, peptide count, iRT deviation, TIC area, column age. Maintenance events render as vertical markers.
Sample Health — non-QC Bruker .d acquisitions monitored for TIC dropout and injection failures.
Fleet — all instruments on the shared drive in one view; send remote commands.
Config — live view of instruments.yml and thresholds.yml.
Community — your benchmark standing within your cohort, submission log, TIC overlay vs. community runs.
Arcade — retro mini-games with global community leaderboard (opt-in).
Museum — interactive historical QC archive: 999 BSA injections from 2005–2022 across every instrument era the UC Davis Proteomics Core has operated, searched with Sage v0.14.7. Timeline, trend chart, coverage maps, and "Then vs Now" comparison panel. See docs/MUSEUM_DEPLOY.md to deploy the standalone page to the community HF Space.

Architecture

Raw data dir (watched by watcher daemon)
    │  file stable for stable_secs
    ▼
detector.py → reads .d/analysis.tdf or .raw metadata → DIA or DDA?
    │
    ├─ DIA → diann.py → SLURM job → report.parquet
    └─ DDA → sage.py  → SLURM job → results.sage.parquet
                                │
                        extractor.py + chromatography.py
                                │
                        evaluator.py → PASS / WARN / FAIL
                            │                │
                    SQLite (Hive)      queue.py (HOLD flag)
                            │
                    dashboard (FastAPI + React, port 8421)
                            │
                    community/submit.py → HF Dataset

Key design decisions

Precursor count (DIA) and PSM count (DDA) are the primary metrics — not protein count. Protein count is confounded by FASTA choice and inference settings and is shown only as a contextual secondary. This is what makes cross-lab comparison valid.
Community benchmark is the cross-lab surface — submissions are compared only within (instrument family, SPD bucket, injection amount bucket). Opt-in; default off.
Privacy — raw files never leave your lab. Only aggregate run-level metrics are submitted. Serial numbers are stored server-side but never exposed in the API or downloads.
SPD-first cohort bucketing — cohorts are keyed on samples-per-day, not gradient minutes, because SPD directly encodes throughput intent. The layered SPD resolution chain reads Bruker method XML first, then TDF metadata, then gradient frame span, then filename tokens.
All three modes share the same stan.db schema — a run processed locally on an instrument PC looks identical in the database to one processed via SLURM on a cluster.

Supported instruments

Vendor	Models	Raw format	Modes
Bruker	timsTOF Ultra 2, Ultra, HT, Pro 2, SCP	`.d` directory	diaPASEF, ddaPASEF
Thermo	Astral, Exploris 480/240, Orbitrap Fusion Lumos, Eclipse	`.raw` file	DIA, DDA

Key metrics

Metric	Modes	What it tells you
IPS (0–100)	DIA + DDA	Cohort-calibrated composite of precursor/PSM + peptide + protein depth. The single number to check first.
Precursor count @ 1% FDR	DIA	Primary DIA metric.
PSM count @ 1% FDR	DDA	Primary DDA metric.
Peptide count	both	Secondary depth metric.
Protein count	both	Contextual. Never used for ranking.
Missed cleavage rate	both	Digestion quality. Healthy: < 0.15.
Median CV (precursor)	DIA, replicates	Quantitative reproducibility. Healthy timsTOF Ultra: 4–9%.
iRT max deviation	DIA	Retention-time drift from the empirical cIRT panel.
Points across peak	both	Median MS2 scans per elution peak. Quantitation quality.
PEG contamination score	Bruker	MS1 scan for the polyethylene-glycol ladder.
diaPASEF window drift	Bruker	Detects MS2 windows walking off their 1/K0 calibration.

Full definitions, reference ranges, and formulas: docs/user_guide.md and docs/ips_metric.md.

Community benchmark


Public dashboard	community.stan-proteomics.org · HF Space
Public dataset	huggingface.co/datasets/brettsp/stan-benchmark · CC BY 4.0

Runs stan setup, answer yes to the benchmark question, and STAN claims an anonymous pseudonym and stores an auth token. Subsequent QC runs are submitted automatically via the HF Space relay — no HF token required on your end.

Three tracks: Track A (DDA, PSM primary), Track B (DIA, precursor primary), Track C (both within 24 h from the same instrument — unlocks a six-axis radar fingerprint).

Implementation Status

What ships today vs. what's still planned.

Component	Status	Notes
CLI (56 commands)	Done	Full list in `docs/user_guide.md`.
Watcher daemon	Done	File-stability detection, hot-reloaded config, recursive monitoring, startup catch-up sweep.
Acquisition mode detection	Done	Bruker via `analysis.tdf.Frames.MsmsType`; Thermo via ThermoRawFileParser metadata + filename token fallback.
Local DIA-NN execution	Done	Default. Subprocess on the instrument PC, community-standard params.
Local Sage execution	Done	Default. Bruker `.d` native, Thermo `.raw` via ThermoRawFileParser → mzML.
SLURM HPC execution (optional)	Done	`execution_mode: slurm` per instrument. SSH + `sbatch` via system `ssh` (with `ControlMaster`); no `paramiko` dependency.
Metric extraction (DIA + DDA)	Done	Polars-based, from `report.parquet` and `results.sage.parquet`.
IPS scoring	Done	3-component depth composite (precursors / peptides / proteins), 0–100, percentile-mapped against an `(instrument family, SPD bucket)` cohort. See `docs/ips_metric.md`.
QC gating + HOLD flag	Done	Hard gates with plain-English diagnosis.
Column health	Done	TIC AUC + peak RT trend analysis.
SQLite database	Done	All metrics, gate results, sample-health verdicts, maintenance events, PEG/drift breakdowns, 4DFF features-by-charge.
FastAPI dashboard backend	Done	All routes wired (runs, trends, instruments, thresholds, fleet, community, PEG, drift, 4DFF, sample-health, hide). Swagger at `/docs`.
Single-file React dashboard	Done	`stan/dashboard/public/index.html`, React + Babel via CDN. 9 tabs.
Historical QC Museum	Done	`stan/dashboard/public/museum.html` — 999 BSA injections 2005–2022, Sage-searched; timeline, trend chart (log-scale), BSA coverage maps, Then vs Now table. Deploy guide: `docs/MUSEUM_DEPLOY.md`.
Setup wizard	Done	6 questions, dedupes `instruments.yml`, offers baseline at the end.
Baseline builder	Done	Recursive discovery, auto-detect gradient/LC, pre-flight DIA-NN/Sage tests, resume on interrupt, scheduling (now / tonight / weekend).
Windows installer + updater	Done	`install-stan.bat`, `update-stan.bat`. Self-update from GitHub.
Community submission	Done	Hard gates, soft flags, asset MD5 verification, no HF token needed (relay).
Community auth token	Done	`stan setup` claims a pseudonym via email; relay enforces `X-STAN-Auth` on PATCH.
Community FASTA	Done	UniProt human + universal contaminants, MD5-verified, auto-downloaded on first need.
Community speclibs	Partial	Astral + timsTOF HeLa empirical/predicted libs in progress.
Cohort scoring + percentiles	Done	Computed nightly within `(family, SPD, amount)` cohorts.
HF Space community dashboard	Done	Live at `community.stan-proteomics.org`.
Arcade → community leaderboard	Done	`stan/community/arcade_submit.py`; relay endpoints in `stan/community/scripts/relay_arcade.py`. Opt-in via `arcade_submit: true` in `community.yml`.
Bruker `.d` XML method-tree parser	Done	Reads `<N>.m/submethods.xml`, `hystar.method`, `SampleInfo.xml` for authoritative SPD + Evosep detection.
`validate_spd_from_metadata()`	Done	XML → MethodName → `Frames.Time` span fallback chain.
`detect_lc_system()`	Done	Evosep vs custom from `.d` method tree + TrayType; powers the LC filter on the community TIC overlay.
Real acquisition-date preservation	Done	Bruker `analysis.tdf.AcquisitionDateTime` / Thermo `fisher_py` CreationDate, not insertion time.
DIA-NN filename `--` sanitizer	Done	Junction/symlink workaround for the DIA-NN argv-parsing bug.
Today TIC overlay	Done	`/api/today/tic-overview` powers the at-a-glance pump-and-spray view.
PEG contamination panel	Done	`stan backfill-peg`, scoring, lollipop chart in the run modal.
diaPASEF window drift	Done	`stan backfill-window-drift`, drift cloud scatter in the run modal.
4DFF Ion Cloud	Done	`stan install-4dff`, `run-4dff`, `backfill-features`. Plotly per-charge view, SVG fallback.
cIRT panel + trends	Done	`stan backfill-cirt`, `derive-cirt-panel`, Trends tab visualisation.
Maintenance log UI	Done	Trends-tab form. Events render as vertical markers on every trend chart.
Hide / restore a run	Done	`POST /api/runs/{id}/hide`. UI button on the QC History row.
Sample Health (rawmeat)	Done	Bruker `.d` non-QC files monitored; verdict (pass/warn/fail) stored in `sample_health` table. Thermo support TBD.
Fleet sync (SMB / HF Space / none)	Done	`~/.stan/fleet.yml`, configured by `stan/fleet_setup.py`.
Fleet command queue	Done	12 whitelisted actions (`ping`, `status`, `tail_log`, `export_db_snapshot`, `watcher_debug`, `qc_filter_report`, `apply_config`, `update_stan`, `restart_watcher`, `cleanup_excluded`, `fix_instrument_names`, `v1_prep`).
Email reports	Done	Daily 07:00 + optional Monday weekly. Resend API.
Slack alerts	Done	Webhook in `community.yml`. `stan test-alert` to verify.
Error telemetry (opt-in)	Done	Anonymous reports to the relay; local log at `~/.stan/error_log.json`.
Front-page view selector	Done	Gauges / Weekly table / Metric matrix on This Week's QCs.
Test fixtures (real DIA-NN / Sage output)	Planned	`tests/fixtures/` is mostly empty.
Outlier detection (amount / SPD mismatch)	Planned	Flag submissions whose metrics don't match the declared cohort.
Community downtime / reliability leaderboard	Planned	MTBF / availability / recovery-time per instrument model.
PyPI release	Planned	`pip install stan-proteomics` not yet published.
Auto-start `stan watch` as a Windows service	Planned	Today the operator launches it manually after install.
Mobile PWA	Planned	Responsive CSS + service worker + push notifications on FAIL.

Roadmap / TODO

The shortlist of things actively being worked on or queued. (Bug fixes and shipped features have been moved out of this list — see Implementation Status above.)

High priority

Medium priority

Documentation index

Doc	Contents
`STAN_MASTER_SPEC.md`	Authoritative design doc. Read before changing core behavior.
`docs/INSTALL_MODE_B_WSL.md`	Mode B — WSL2 install and configuration.
`docs/INSTALL_MODE_C_HPC.md`	Mode C — SLURM/HPC install and configuration.
`docs/user_guide.md`	Day-to-day manual: all CLI commands, dashboard tour, config reference, troubleshooting.
`docs/ips_metric.md`	IPS formula, cohort references, why protein count is excluded.
`docs/external_tools.md`	DIA-NN, Sage, ThermoRawFileParser: CLI flags, version pins, container paths, gotchas.
`docs/HPC_PATHS.md`	Hive HPC reference paths for SLURM integration.
`docs/GOTCHAS_DELIMP.md`	50+ hard-learned lessons: DIA-NN edge cases, SLURM quirks, raw-file parsing traps.
`docs/INSTALL_REGRESSION_CHECKLIST.md`	Mode A install regression checklist: pre-flight, during-install warnings, 10-question post-install verification, known failure modes.
`CLAUDE.md`	Context for AI coding agents working on this codebase.

Search engines

STAN does not bundle DIA-NN, Sage, or ThermoRawFileParser. Each is called as a subprocess and must be installed separately. The Windows installer fetches DIA-NN and Sage automatically.

Tool	Used for	License
DIA-NN	All DIA searches (Bruker `.d` and Thermo `.raw` natively, no conversion)	Free for academic research; commercial use requires a paid license from Aptila Biotech or Thermo.
Sage	All DDA searches. Bruker `.d` native. Thermo `.raw` requires mzML conversion first.	MIT
ThermoRawFileParser	Thermo DDA only — `.raw` → indexed mzML. Auto-downloaded on first use; cached at `~/.stan/tools/`.	Apache 2.0

Citing STAN

No paper yet. Until one lands, please cite:

Phinney BS. STAN: Standardized proteomic Throughput ANalyzer. UC Davis Proteomics Core (2026). https://github.com/bsphinney/stan

Also cite the search engine(s) STAN runs on your data:

Demichev V, et al. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nature Methods. 2020;17:41–44. https://doi.org/10.1038/s41592-019-0638-x

Lazear MR. Sage: An Open-Source Tool for Fast Proteomics Searching and Quantification at Scale. J. Proteome Research. 2023;22(11):3652–3659. https://doi.org/10.1021/acs.jproteome.3c00486

Contributing

Open an issue for design discussion first, then submit a PR. Run ruff check stan/ and pytest tests/ -v before submitting. Prefer real DIA-NN or Sage output snippets in tests/fixtures/ over synthetic data.

License

Code: STAN Academic License — free for academic, non-profit, educational, and personal research use. Commercial use (CROs, pharma, biotech) requires a separate agreement. Contact bsphinney@ucdavis.edu.

Community dataset: CC BY 4.0.

Links


GitHub	https://github.com/bsphinney/stan
Community dashboard	https://community.stan-proteomics.org · https://huggingface.co/spaces/brettsp/stan
Community dataset	https://huggingface.co/datasets/brettsp/stan-benchmark
DE-LIMP (sister project)	https://github.com/bsphinney/DE-LIMP

Name		Name	Last commit message	Last commit date
Latest commit History 536 Commits
.claude/worktrees		.claude/worktrees
.github/workflows		.github/workflows
community_fasta		community_fasta
config		config
docs		docs
scripts		scripts
stan		stan
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Launch_STAN_WSL.bat		Launch_STAN_WSL.bat
MANIFEST.in		MANIFEST.in
README.md		README.md
STAN_MASTER_SPEC.md		STAN_MASTER_SPEC.md
install-stan.bat		install-stan.bat
install_stan.ps1		install_stan.ps1
pyproject.toml		pyproject.toml
stan.bat		stan.bat
stan_update_check.ps1		stan_update_check.ps1
stan_wsl_setup.sh		stan_wsl_setup.sh
start_stan.bat		start_stan.bat
start_stan_loop.bat		start_stan_loop.bat
test-hive-flow.bat		test-hive-flow.bat
update-stan.bat		update-stan.bat
update_stan.ps1		update_stan.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STAN — Standardized proteomic Throughput ANalyzer

What it is

Quick install — pick your mode

Mode A — instrument PC (Windows, recommended)

Mode A — Mac / Linux / advanced

The dashboard

Architecture

Key design decisions

Supported instruments

Key metrics

Community benchmark

Implementation Status

Roadmap / TODO

Documentation index

Search engines

Citing STAN

Contributing

License

Links

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

STAN — Standardized proteomic Throughput ANalyzer

What it is

Quick install — pick your mode

Mode A — instrument PC (Windows, recommended)

Mode A — Mac / Linux / advanced

The dashboard

Architecture

Key design decisions

Supported instruments

Key metrics

Community benchmark

Implementation Status

Roadmap / TODO

Documentation index

Search engines

Citing STAN

Contributing

License

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages