Skip to content

Add input validation for dictionary -form region parameters in _parse_single_region and add mini-test to it#1230

Open
Batmend-bit wants to merge 4 commits intomalariagen:masterfrom
Batmend-bit:improve-region-validation
Open

Add input validation for dictionary -form region parameters in _parse_single_region and add mini-test to it#1230
Batmend-bit wants to merge 4 commits intomalariagen:masterfrom
Batmend-bit:improve-region-validation

Conversation

@Batmend-bit
Copy link
Copy Markdown

@Batmend-bit Batmend-bit commented Mar 25, 2026

Summary

This PR adds input validation for the 'Mapping' code path in '_parse_single_region'. In the previous code, there was no problem only if the inputs were perfectly safe. However, there was no input validation; malformed dictionary inputs can be silently accepted and would cause issues later. This change makes the function fail fast with a clear, actionable error message at the point of entry.

Problem

The following malformed inputs can be silently accepted:

  1. Missing contig: {"start":100, "end":200}
  2. Invalid contig name: {"contig": "Invalid_name", "start':100, "end":200}
  3. Negative coordinate or end > start: {"contig": "2L", "start':-100, "end":200} or {"contig": "2L", "start':100, "end": 20}
  4. Ending coordinate is greater than the length of the genome sequence.
    Therefore, I thought that input validation for the 'Mapping' is useful.

Solution

Added the following validation in the 'isinstance(region, Mapping)' branch:

  1. 'contig' is required - raises ValueError if missing
  2. 'contig' must be valid - checked against '_valid_contigs(resource)' as same as how string regions are validated in '_handle_region_coords'
  3. Coordinates must be positive integers - checked as same as in '_handle_region_coords'
  4. 'start' must not exceed 'end' - consistent with the existing check in
    _handle_region_coords
  5. 'end' must be less than the length of the sequence
    Additionally added a docstring to clarify expected input structure and validation behavior.

Effect

  1. Fully backward compatible for valid inputs.
  2. No existing tests are broken.
  3. Just added validation for inputs that were already broken to prevent silent acceptance of malformed inputs.

Testing

New tests were added in 'tests/test_util.py' with pytest by covering:

  • Valid whole-contig dict region
  • Valid interval dict region (only end, only start coordinates are accepted)
  • 8 invalid dict cases each raising ValueError
  • Invalid string region cases
  • Invalid type cases raising TypeError

All tests passed when I ran it with poetry: poetry run pytest tests/test_util.py -v

@Batmend-bit Batmend-bit changed the title Add input validation for dictionary -form region parameters in _parse_signle_region and add mini-test to it Add input validation for dictionary -form region parameters in _parse_single_region and add mini-test to it Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant