Skip to content

[aw][test audit] models.py: _sanitize_non_numeric_tokens passes float('inf') and float('nan') to Pydantic int coercion; cross-ch [Content truncated due to length] #875

@microsasa

Description

@microsasa

Root Cause

AssistantMessageData._sanitize_non_numeric_tokens (src/copilot_usage/models.py) has two guards:

if isinstance(v, (bool, str)):
    return 0
if isinstance(v, (int, float)) and v <= 0:
    return 0
return v

For float('inf') and float('nan'), both guards pass silently:

  • isinstance(inf, (bool, str))False
  • isinstance(inf, (int, float)) and inf <= 0False (inf > 0; nan <= 0 is also False in Python)

So inf and nan are returned unchanged and passed to Pydantic's type coercion for the outputTokens: int field. Pydantic then raises ValidationError because:

  • int(float('inf')) raises OverflowError
  • int(float('nan')) raises ValueError

Note: -inf is handled correctly (maps to 0) via the <= 0 guard.

Fast-Path Divergence

_extract_output_tokens (the fast path in _first_pass) handles these correctly — float.is_integer() returns False for both inf and nan, so both return None. But the TestExtractOutputTokensEquivalence cross-check tests (_EQUIVALENCE_CASES) do not include float('inf') or float('nan'), so this divergence is unverified by the test suite.

Impact

If a Copilot API response ever contains Infinity or NaN in the outputTokens field (non-standard but valid JSON in some parsers), parse_events would catch the ValidationError and log a warning, but any other code path going through AssistantMessageData.model_validate directly would get an unexpected exception. Additionally, the validator's documented contract ("map non-positive and non-numeric token counts to 0") is violated — inf/nan are non-numeric for token-counting purposes but aren't mapped to 0.

Tests to Add

1. test_models.pyTestSanitizeNonNumericTokens class (or extend existing TestAssistantMessageData)

`@pytest`.mark.parametrize("value", [float("inf"), float("nan")])
def test_sanitize_special_floats_map_to_zero(value: float) -> None:
    """inf and nan are not valid token counts; validator must map them to 0."""
    result = AssistantMessageData.model_validate({"outputTokens": value})
    assert result.outputTokens == 0

def test_sanitize_negative_inf_maps_to_zero() -> None:
    result = AssistantMessageData.model_validate({"outputTokens": float("-inf")})
    assert result.outputTokens == 0  # already passes; regression guard

2. test_parser.py — extend _EQUIVALENCE_CASES cross-check

Add float('inf') and float('nan') to _EQUIVALENCE_CASES in TestExtractOutputTokensEquivalence. Since _extract_output_tokens returns None for these (non-contributing), and the docstring says "inputs rejected by model validation should likewise be treated as non-contributing", the equivalence table entry should assert that both paths produce a non-contributing result.

Also add explicit unit tests to TestExtractOutputTokens:

`@pytest`.mark.parametrize("special", [float("inf"), float("nan"), float("-inf")])
def test_returns_none_for_ieee_special_floats(special: float) -> None:
    assert _extract_output_tokens(_make_assistant_event(special)) is None

3. Validator Fix (prerequisite for test 1)

The validator should be updated to explicitly handle inf/nan before the tests can pass:

import math
if isinstance(v, float) and (math.isinf(v) or math.isnan(v)):
    return 0

Or more concisely, change the <= 0 guard to also cover non-finite values:

if isinstance(v, (int, float)) and (not isinstance(v, bool)) and (v <= 0 or (isinstance(v, float) and not v == v)):
    return 0

The cleanest approach uses math.isfinite:

if isinstance(v, (int, float)) and (v <= 0 or (isinstance(v, float) and not math.isfinite(v))):
    return 0

Regression Scenario

Any future refactor of _sanitize_non_numeric_tokens that removes or simplifies the guards could reintroduce this. Tests for inf/nan would catch it immediately.

Generated by Test Suite Analysis · ● 1.9M ·

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions