Skip to content

Add genomic summary helper functions for genotype data#1256

Open
cleberfc23 wants to merge 1 commit intomalariagen:masterfrom
cleberfc23:add-genomic-summary-utils
Open

Add genomic summary helper functions for genotype data#1256
cleberfc23 wants to merge 1 commit intomalariagen:masterfrom
cleberfc23:add-genomic-summary-utils

Conversation

@cleberfc23
Copy link
Copy Markdown

This PR introduces helper functions for summarizing genotype data, including:

  1. compute_missing_rate: calculates the proportion of missing genotype calls (-1)
  2. compute_informative_sites: counts positions with at least one non-missing allele

These utilities are useful for exploratory analysis and machine learning preprocessing workflows involving genomic data.

Basic tests have been added to ensure correctness.

This contribution is motivated by the need to simplify feature extraction from Zarr-based genotype datasets, especially for downstream ML tasks such as taxonomic classification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant