-
Notifications
You must be signed in to change notification settings - Fork 5
unit_conversion crashes with TypeError when source value is non-numeric string #175
Description
Bug Report
Summary
convert_units() in linkml_map/functions/unit_conversion.py crashes with TypeError: can't multiply sequence by non-int of type 'float' when a source column value is a non-numeric string (e.g., 'A' used as a missing/coded value in dbGaP data).
Environment
- linkml-map version: 0.4.0
- Python: 3.12
- pint/ucumvert: latest (via dm-bip Docker image)
Reproduction
YAML transform spec (simplified):
value_decimal:
populated_from: phv00203154
unit_conversion:
source_unit: "[lb_av]"
target_unit: "kg"Source data: TSV column phv00203154 (ARIC weight in lbs) contains the value 'A' for some rows (likely a missing/not-collected code).
Stack Trace
linkml_map/functions/unit_conversion.py:95 in convert_units
quantity = magnitude * from_unit_q # magnitude = 'A' (string)
return quantity.to(to_unit).magnitude # pint tries 'A' * 0.4535... -> TypeError
pint/facets/plain/registry.py:1097 in _convert
value = value * factor # value='A', factor=0.4535923700000001
-> TypeError: can't multiply sequence by non-int of type 'float'
Key locals from the actual stack trace:
magnitude = 'A'
from_unit = '1 pound'
to_unit = '1 kilogram'
factor = 0.4535923700000001
Root Cause
In unit_conversion.py line 93:
quantity = magnitude * from_unit_qThe magnitude parameter arrives as a raw string from the source data (TSV column value). The function does not:
- Attempt to coerce the string to a numeric type
- Check if the value is numeric before constructing the pint Quantity
- Gracefully handle non-numeric values (skip/return None)
Note: The expr: evaluation path in linkml-map/dm-bip does coerce string values to numeric before simpleeval handles them, so expr: "{phv} * 0.453592" would work for numeric rows. But unit_conversion bypasses that coercion path entirely.
Expected Behavior
Non-numeric values should be handled gracefully. Options:
- Coerce to float first and skip/return None if coercion fails (preferred -- matches
expr:path behavior) - Raise a more descriptive error that identifies the problematic value and source column
- Log a warning and return None for non-numeric values
Workaround
Currently, the only workaround is to replace unit_conversion with an explicit expr: that uses the pipeline's upstream string-to-numeric coercion:
# Instead of:
value_decimal:
populated_from: phv00203154
unit_conversion:
source_unit: "[lb_av]"
target_unit: "kg"
# Use:
value_decimal:
expr: "{phv00203154} * 0.453592"This works because the expr: evaluation path coerces strings to numeric before simpleeval processes them. However, this loses the semantic unit metadata and the precise pint conversion factor.
Impact
This blocks all YAML transform files that use unit_conversion where the source column contains any non-numeric coded values (common in dbGaP data where 'A', 'M', etc. represent missing/not-applicable).
Discovered during ARIC cohort pipeline run in dm-bip against bdy_wgt.yaml (body weight in lbs to kg conversion).