Skip to content

Fix/replace asserts with runtime validation#1260

Open
khushthecoder wants to merge 1 commit intomalariagen:masterfrom
khushthecoder:fix/replace-asserts-with-runtime-validation
Open

Fix/replace asserts with runtime validation#1260
khushthecoder wants to merge 1 commit intomalariagen:masterfrom
khushthecoder:fix/replace-asserts-with-runtime-validation

Conversation

@khushthecoder
Copy link
Copy Markdown
Contributor

Replace assert Statements with Proper Runtime Validation

Fix #1259

Summary

This PR replaces 42 production assert statements with explicit if/raise validation checks across 13 production modules. This ensures that all input validation and internal state checks are enforced even when Python runs with the -O (optimize) flag, which silently strips assert statements — potentially causing confusing downstream errors or silent data corruption.


Context

Python's assert statement is designed for debugging, not production validation. When Python is invoked with -O or -OO, all assert statements are compiled away to no-ops. This means:

  • Parameter validation disappears (e.g., assert contig in self.contigs)
  • Shape/dimension checks vanish (e.g., assert ha.ndim == hb.ndim == 2)
  • Internal state guards evaporate (e.g., assert self._default_site_mask is not None)

In a data-science library like malariagen-data-python, where downstream code relies on array shapes and contig membership being correct, this can lead to silent data corruption — the worst kind of bug.


Changes

Exception Mapping Strategy

Each assert was replaced with a semantically appropriate exception:

Category Exception Example
Invalid parameter values ValueError contig not in self.contigs, metric not in ("hamming", "jaccard")
Wrong array shape/dimensions ValueError ha.ndim != 2, ha.shape[0] != hb.shape[0]
Insufficient cohort size ValueError n_samples < min_cohort_size
Internal state unexpectedly None RuntimeError self._default_site_mask is None
Internal consistency violations RuntimeError nobs_mode != "fixed", jackknife resampling mismatch

All replacement exceptions include descriptive error messages with actual vs. expected values (e.g., f"Contig {contig!r} not found. Available contigs: {self.contigs}"), significantly improving debuggability.

Files Modified (14 files, +211 / -54)

File Asserts Replaced Key Checks
malariagen_data/util.py 16 Array ndim, shape, dtype, hash64, region parsing
malariagen_data/anoph/snp_data.py 7 Contig membership (×5), site mask config, variant dimension
malariagen_data/anoph/hap_data.py 3 Phasing analysis config, contig membership (×2)
malariagen_data/anoph/hap_frq.py 2 Cohort size validation (×2)
malariagen_data/anoph/snp_frq.py 2 Cohort size, nobs_mode invariant
malariagen_data/anoph/genome_features.py 2 Contig membership, figure height
malariagen_data/anoph/h1x.py 2 Array ndim and shape compatibility
malariagen_data/anoph/aim_data.py 2 AIM palette config, palette length
malariagen_data/anoph/sample_metadata.py 2 AIM metadata columns/dtype config
malariagen_data/anoph/cnv_frq.py 1 nobs_mode invariant
malariagen_data/anoph/genome_sequence.py 1 Contig membership
malariagen_data/anopheles.py 1 Jackknife resampling invariant
malariagen_data/mjn.py 1 Metric parameter validation

Intentionally Preserved assert (3 instances)

Three assert statements inside @numba.njit-decorated functions in util.py are intentionally kept as-is:

# _square_to_condensed (line 1410) — @numba.njit
assert i != j, "no diagonal elements in condensed matrix"
 
# _apply_allele_mapping (lines 1649-1650) — @numba.njit
assert mapping.shape[0] == n_sites
assert mapping.shape[1] == n_alleles

Reason: Numba's JIT compiler does not support raising Python exceptions (ValueError, RuntimeError, etc.) inside @numba.njit functions. Using assert is the only available validation mechanism in JIT-compiled code, and Numba does compile these assertions into runtime checks (they are NOT stripped by -O).


Example: Before vs. After

Before (silently stripped with -O):

assert contig in self.contigs

After (always enforced, with descriptive message):

if contig not in self.contigs:
    raise ValueError(
        f"Contig {contig!r} not found. "
        f"Available contigs: {self.contigs}"
    )

Verification

# Confirm only the 3 numba-protected asserts remain:
$ grep -rn '^\s*assert ' malariagen_data/ --include='*.py' | grep -v __pycache__
malariagen_data/util.py:1410:    assert i != j, "no diagonal elements in condensed matrix"
malariagen_data/util.py:1649:    assert mapping.shape[0] == n_sites
malariagen_data/util.py:1650:    assert mapping.shape[1] == n_alleles

Behavioral Impact

  • Zero functional changes — all validation logic is identical; only the error-raising mechanism changes
  • Better error messages — users now see descriptive ValueError/RuntimeError messages instead of bare AssertionError
  • -O safe — validation is enforced regardless of Python optimization level
  • No new dependencies — uses only built-in Python exceptions
  • Backward compatible — no API changes; existing code that was valid before remains valid

…uction code

Resolves malariagen#1259.

Replace 40+ assert statements across 13 production modules with explicit
if/raise checks using ValueError, TypeError, or RuntimeError with
descriptive error messages. This ensures validation is enforced even when
Python runs with -O (optimize) flag, which silently strips assert
statements.

Exception mapping:
- Invalid parameter values (metric, contig) → ValueError
- Wrong array shape/ndim/dtype → ValueError
- Internal state unexpectedly None → RuntimeError
- Internal consistency violations → RuntimeError

Three assert statements inside @numba.njit functions in util.py are
intentionally preserved, as numba JIT compilation does not support
Python exception raising.

Files modified:
- malariagen_data/util.py
- malariagen_data/mjn.py
- malariagen_data/anopheles.py
- malariagen_data/anoph/snp_data.py
- malariagen_data/anoph/hap_data.py
- malariagen_data/anoph/hap_frq.py
- malariagen_data/anoph/snp_frq.py
- malariagen_data/anoph/cnv_frq.py
- malariagen_data/anoph/genome_features.py
- malariagen_data/anoph/genome_sequence.py
- malariagen_data/anoph/h1x.py
- malariagen_data/anoph/sample_metadata.py
- malariagen_data/anoph/aim_data.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace assert Statements with Proper Runtime Validation in Production Code

1 participant