Skip to content

unit_conversion crashes with TypeError when source value is non-numeric string #175

@csiege

Description

@csiege

Bug Report

Summary

convert_units() in linkml_map/functions/unit_conversion.py crashes with TypeError: can't multiply sequence by non-int of type 'float' when a source column value is a non-numeric string (e.g., 'A' used as a missing/coded value in dbGaP data).

Environment

  • linkml-map version: 0.4.0
  • Python: 3.12
  • pint/ucumvert: latest (via dm-bip Docker image)

Reproduction

YAML transform spec (simplified):

value_decimal:
  populated_from: phv00203154
  unit_conversion:
    source_unit: "[lb_av]"
    target_unit: "kg"

Source data: TSV column phv00203154 (ARIC weight in lbs) contains the value 'A' for some rows (likely a missing/not-collected code).

Stack Trace

linkml_map/functions/unit_conversion.py:95 in convert_units
    quantity = magnitude * from_unit_q     # magnitude = 'A' (string)
    return quantity.to(to_unit).magnitude  # pint tries 'A' * 0.4535... -> TypeError

pint/facets/plain/registry.py:1097 in _convert
    value = value * factor   # value='A', factor=0.4535923700000001
    -> TypeError: can't multiply sequence by non-int of type 'float'

Key locals from the actual stack trace:

magnitude = 'A'
from_unit = '1 pound'
to_unit   = '1 kilogram'
factor = 0.4535923700000001

Root Cause

In unit_conversion.py line 93:

quantity = magnitude * from_unit_q

The magnitude parameter arrives as a raw string from the source data (TSV column value). The function does not:

  1. Attempt to coerce the string to a numeric type
  2. Check if the value is numeric before constructing the pint Quantity
  3. Gracefully handle non-numeric values (skip/return None)

Note: The expr: evaluation path in linkml-map/dm-bip does coerce string values to numeric before simpleeval handles them, so expr: "{phv} * 0.453592" would work for numeric rows. But unit_conversion bypasses that coercion path entirely.

Expected Behavior

Non-numeric values should be handled gracefully. Options:

  1. Coerce to float first and skip/return None if coercion fails (preferred -- matches expr: path behavior)
  2. Raise a more descriptive error that identifies the problematic value and source column
  3. Log a warning and return None for non-numeric values

Workaround

Currently, the only workaround is to replace unit_conversion with an explicit expr: that uses the pipeline's upstream string-to-numeric coercion:

# Instead of:
value_decimal:
  populated_from: phv00203154
  unit_conversion:
    source_unit: "[lb_av]"
    target_unit: "kg"

# Use:
value_decimal:
  expr: "{phv00203154} * 0.453592"

This works because the expr: evaluation path coerces strings to numeric before simpleeval processes them. However, this loses the semantic unit metadata and the precise pint conversion factor.

Impact

This blocks all YAML transform files that use unit_conversion where the source column contains any non-numeric coded values (common in dbGaP data where 'A', 'M', etc. represent missing/not-applicable).

Discovered during ARIC cohort pipeline run in dm-bip against bdy_wgt.yaml (body weight in lbs to kg conversion).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions