Skip to content

feat: clinical infrastructure for blood biomarker clocks#203

Open
marcbal77 wants to merge 1 commit intobio-learn:masterfrom
marcbal77:feature/clinical-infrastructure
Open

feat: clinical infrastructure for blood biomarker clocks#203
marcbal77 wants to merge 1 commit intobio-learn:masterfrom
marcbal77:feature/clinical-infrastructure

Conversation

@marcbal77
Copy link
Copy Markdown
Member

Summary

  • Add clinical data layer to GeoData (5th layer alongside dnam, rna, protein_alamar, protein_olink)
  • Add GeoData.from_clinical_matrix(df, source_units=, units=) factory for loading clinical blood test data
  • Add biomarker registry (biolearn.clinical.registry) with canonical names, units, valid ranges, and unit conversions for 16 biomarkers
  • Add required_features() method to all 12 model classes, returning {"layer": str, "features": list, "metadata": list}
  • Add load_nhanes_as_geodata() bridge function in biolearn.load

This is PR 1 of the clinical clocks initiative. It ships only infrastructure (no clock implementations). All 69 existing clocks continue to work unchanged. Foundation for PhenoAge Clinical, KDM, Bortz Blood Age, and other clinical aging clocks in subsequent PRs.

Partial fix for #194.

1.0 API surfaces for review

These interfaces will become stable at 1.0.0. Please review carefully:

  • GeoData.from_clinical_matrix(df, source_units=, units=) signature
  • model.required_features() return format {"layer": str, "features": list, "metadata": list}
  • GeoData.clinical attribute (features-as-rows, samples-as-columns)
  • BIOMARKER_REGISTRY structure

Test plan

  • 11 tests for GeoData clinical layer (init, copy, save/load roundtrip, from_clinical_matrix, unit conversion, validation)
  • 16 tests for biomarker registry and unit conversions
  • Parametrized required_features() interface test across all 69 models
  • Consistency test: required_features() matches methylation_sites() for dnam models
  • Full existing test suite passes (323 passed, 10 skipped)
  • make format clean

Comment thread biolearn/load.py
return df


def load_nhanes_as_geodata(year):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a data library entry for this.

Add GeoData clinical layer, biomarker registry with unit conversions,
required_features() interface on all models, and NHANES-to-GeoData bridge.
Foundation for clinical aging clocks (PhenoAge, KDM, Bortz, etc.)

Addresses bio-learn#194
@marcbal77 marcbal77 force-pushed the feature/clinical-infrastructure branch from a8247fc to e5b1674 Compare April 23, 2026 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants