Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion src/bioetl/domain/normalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,25 @@ def parse_date_field(value: str | None, fmt: str = "%Y-%m-%d") -> date | None:
"""
if value is None:
return None

try:
value = value.strip()
except AttributeError:
return None

# Fast path for standard ISO format (YYYY-MM-DD) which is the most common case
if fmt == "%Y-%m-%d" and len(value) == 10 and value[4] == "-" and value[7] == "-":
try:
from datetime import date

return date(int(value[0:4]), int(value[5:7]), int(value[8:10]))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject non-digit ISO segments in fast-path parser

The new fast path uses int(...) on fixed slices, which is more permissive than strptime and now accepts malformed values for %Y-%m-%d (for example 2024- 3-15 or 2024-+3-15) that previously returned None. Because these strings satisfy the length/hyphen checks, they bypass the fallback and are normalized into valid dates, which can silently mask dirty source data instead of treating it as a parse error.

Useful? React with πŸ‘Β / πŸ‘Ž.

except ValueError:
pass # Fall back to strptime for complex validation (e.g., leap years)

from datetime import datetime

try:
return datetime.strptime(value.strip(), fmt).date()
return datetime.strptime(value, fmt).date()
except (ValueError, AttributeError):
return None

Expand Down
Loading