Replace `assert` Statements with Proper Runtime Validation in Production Code

## Issue Description
 
The codebase uses Python `assert` statements for input validation and invariant checks in production code (non-test files). This is problematic because `assert` statements are silently stripped when Python runs with `-O` (optimize flag), meaning all validation checks disappear in optimized builds — leading to confusing downstream errors or silent data corruption.
 
There are 50+ `assert` statements across production modules:
 
| File | Count | Example |
|------|-------|---------|
| `util.py` | ~20 | `assert indexer.ndim == 1` |
| `anoph/snp_data.py` | ~6 | `assert contig in self.contigs` |
| `mjn.py` | 1 | `assert metric in ["hamming", "jaccard"]` |
| `anoph/hap_data.py` | 3 | `assert contig in self.contigs` |
| `anoph/hap_frq.py` | 2 | `assert n_samples >= min_cohort_size` |
| `anoph/snp_frq.py` | 3 | `assert nobs_mode == "fixed"` |
| `anoph/cnv_frq.py` | 1 | `assert nobs_mode == "fixed"` |
| `anoph/genome_features.py` | 2 | `assert contig in self.contigs` |
| `anoph/genome_sequence.py` | 1 | `assert contig in self.contigs` |
| `anoph/h1x.py` | 2 | `assert ha.ndim == hb.ndim == 2` |
| `anoph/sample_metadata.py` | 2 | `assert self._aim_metadata_columns is not None` |
| `anoph/aim_data.py` | 2 | `assert self._aim_palettes is not None` |
| `anopheles.py` | 1 | `assert np.count_nonzero(loc_j) == n_sites_j` |
 
### Before / After Example
 
```python
# ❌ Current — silently skipped with python -O
assert metric in ["hamming", "jaccard"]
```
 
```python
# ✅ Proposed — always enforced, clear error message
if metric not in ("hamming", "jaccard"):
    raise ValueError(f"metric must be 'hamming' or 'jaccard', got {metric!r}")
```
 
---
 
## Why This Issue Is Important
 
**Reliability** — Users running `python -O` (common in Docker images, production deployments, and some CI pipelines) lose all validation silently. An invalid metric like `"cosine"` would slip through `mjn.py` and produce garbage results instead of a clear error.
 
**Debuggability** — Most of these `assert` statements have no error message. When they do fire, the user sees a bare `AssertionError` with no context. Proper exceptions provide actionable information (e.g., `"Expected contig '2L' but got 'chr2L'"`).
 
**Python Best Practice** — The Python docs explicitly state: *"assert should not be used for data validation because it can be disabled."* PEP 8 and major linters (Ruff S101, Bandit B101) flag this as a code quality issue.
 
**Data Integrity** — In a genomics library, silent shape mismatches or wrong contigs could produce scientifically incorrect results without any error, which is far worse than a crash.
 
---
 
## My Approach to Solve
 
### Step 1: Categorize each `assert` by type
 
| Category | Replacement Exception | Example |
|----------|-----------------------|---------|
| Input validation | `ValueError` / `TypeError` | `assert metric in [...]` → `raise ValueError(...)` |
| State invariants | `RuntimeError` | `assert self._aim_palettes is not None` → `raise RuntimeError(...)` |
| Shape/dimension checks | `ValueError` | `assert arr.ndim == 1` → `raise ValueError(f"Expected 1-D array, got {arr.ndim}-D")` |
| Internal consistency | `RuntimeError` | `assert nobs_mode == "fixed"` → `raise RuntimeError(...)` |
 
### Step 2: Replace in priority order
 
1. `util.py` first — most used, core functions (`_dask_compress_dataset`, `_da_compress`, allele mapping)
2. `anoph/snp_data.py` and `anoph/hap_data.py` — user-facing data access, contig validation
3. `mjn.py` — public-facing parameter validation
4. Remaining `anoph/` modules — `hap_frq.py`, `snp_frq.py`, `cnv_frq.py`, `genome_features.py`, `genome_sequence.py`, `h1x.py`, `aim_data.py`, `sample_metadata.py`
5. `anopheles.py` — one occurrence
 
### Step 3: Add descriptive error messages
 
Every replacement will include a clear, actionable error message explaining what was expected vs. what was received:
 
```python
# Shape validation
if indexer.ndim != 1:
    raise ValueError(
        f"Expected indexer to be 1-dimensional, got {indexer.ndim} dimensions"
    )
 
# Contig validation
if contig not in self.contigs:
    raise ValueError(
        f"Contig {contig!r} not found. Available contigs: {self.contigs}"
    )
 
# State invariant
if self._default_phasing_analysis is None:
    raise RuntimeError(
        "No default phasing analysis configured. "
        "Please specify the 'analysis' parameter explicitly."
    )
```
 
### Step 4: Verify
 
- Run the existing test suite to confirm no regressions
- Verify tests pass with `python -O` flag
- Confirm no `assert` statements remain in production code (only in `tests/`)
 
---
 
## Files to Modify
 
- `malariagen_data/util.py`
- `malariagen_data/mjn.py`
- `malariagen_data/anopheles.py`
- `malariagen_data/anoph/snp_data.py`
- `malariagen_data/anoph/hap_data.py`
- `malariagen_data/anoph/hap_frq.py`
- `malariagen_data/anoph/snp_frq.py`
- `malariagen_data/anoph/cnv_frq.py`
- `malariagen_data/anoph/genome_features.py`
- `malariagen_data/anoph/genome_sequence.py`
- `malariagen_data/anoph/h1x.py`
- `malariagen_data/anoph/sample_metadata.py`
- `malariagen_data/anoph/aim_data.py`
 

File	Count	Example
`util.py`	~20	`assert indexer.ndim == 1`
`anoph/snp_data.py`	~6	`assert contig in self.contigs`
`mjn.py`	1	`assert metric in ["hamming", "jaccard"]`
`anoph/hap_data.py`	3	`assert contig in self.contigs`
`anoph/hap_frq.py`	2	`assert n_samples >= min_cohort_size`
`anoph/snp_frq.py`	3	`assert nobs_mode == "fixed"`
`anoph/cnv_frq.py`	1	`assert nobs_mode == "fixed"`
`anoph/genome_features.py`	2	`assert contig in self.contigs`
`anoph/genome_sequence.py`	1	`assert contig in self.contigs`
`anoph/h1x.py`	2	`assert ha.ndim == hb.ndim == 2`
`anoph/sample_metadata.py`	2	`assert self._aim_metadata_columns is not None`
`anoph/aim_data.py`	2	`assert self._aim_palettes is not None`
`anopheles.py`	1	`assert np.count_nonzero(loc_j) == n_sites_j`

Category	Replacement Exception	Example
Input validation	`ValueError` / `TypeError`	`assert metric in [...]` → `raise ValueError(...)`
State invariants	`RuntimeError`	`assert self._aim_palettes is not None` → `raise RuntimeError(...)`
Shape/dimension checks	`ValueError`	`assert arr.ndim == 1` → `raise ValueError(f"Expected 1-D array, got {arr.ndim}-D")`
Internal consistency	`RuntimeError`	`assert nobs_mode == "fixed"` → `raise RuntimeError(...)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `assert` Statements with Proper Runtime Validation in Production Code #1259

Issue Description

Before / After Example

Why This Issue Is Important

My Approach to Solve

Step 1: Categorize each `assert` by type

Step 2: Replace in priority order

Step 3: Add descriptive error messages

Step 4: Verify

Files to Modify

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replace assert Statements with Proper Runtime Validation in Production Code #1259

Description

Issue Description

Before / After Example

Why This Issue Is Important

My Approach to Solve

Step 1: Categorize each assert by type

Step 2: Replace in priority order

Step 3: Add descriptive error messages

Step 4: Verify

Files to Modify

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Replace `assert` Statements with Proper Runtime Validation in Production Code #1259

Step 1: Categorize each `assert` by type