Skip to content

Improve validation for genomic region parsing in _handle_region_coords #1182

@vinitjain2005

Description

@vinitjain2005

Description:
The function _handle_region_coords currently uses a regex pattern to parse genomic region strings (e.g., "chr1:1000-2000"). However, the validation is not strict enough and may allow malformed inputs or fail silently in some cases.

Problems Identified:

  • Weak regex validation for region format
  • Does not handle invalid formats like "chr1-1000-2000" or "chr1:abc-xyz"
  • Error messages are not very descriptive
  • No test coverage for edge cases

Expected Behavior:

  • Strict validation of region format
  • Clear and user-friendly error messages
  • Proper handling of invalid inputs
  • Unit tests for valid and invalid cases

Proposed Solution:

  • Improve regex pattern for stricter parsing
  • Add validation checks for numeric values
  • Raise meaningful exceptions with clear messages
  • Add unit tests for valid region strings, invalid formats, and edge cases

Additional Context:
Improving this function will make region handling more robust and prevent downstream errors in genomic data processing.

Github: @vinitjain2005

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions