Description:
The function _handle_region_coords currently uses a regex pattern to parse genomic region strings (e.g., "chr1:1000-2000"). However, the validation is not strict enough and may allow malformed inputs or fail silently in some cases.
Problems Identified:
- Weak regex validation for region format
- Does not handle invalid formats like
"chr1-1000-2000" or "chr1:abc-xyz"
- Error messages are not very descriptive
- No test coverage for edge cases
Expected Behavior:
- Strict validation of region format
- Clear and user-friendly error messages
- Proper handling of invalid inputs
- Unit tests for valid and invalid cases
Proposed Solution:
- Improve regex pattern for stricter parsing
- Add validation checks for numeric values
- Raise meaningful exceptions with clear messages
- Add unit tests for valid region strings, invalid formats, and edge cases
Additional Context:
Improving this function will make region handling more robust and prevent downstream errors in genomic data processing.
Github: @vinitjain2005
Description:
The function
_handle_region_coordscurrently uses a regex pattern to parse genomic region strings (e.g.,"chr1:1000-2000"). However, the validation is not strict enough and may allow malformed inputs or fail silently in some cases.Problems Identified:
"chr1-1000-2000"or"chr1:abc-xyz"Expected Behavior:
Proposed Solution:
Additional Context:
Improving this function will make region handling more robust and prevent downstream errors in genomic data processing.
Github: @vinitjain2005