Skip to content

Handle VTL Number type correctly in comparison operators and output formatting #457

@javihern98

Description

@javihern98

Summary

Implement proper handling of VTL Number type for floating-point precision issues in comparison operators and output formatting.

Problem

Floating-point arithmetic in Python can lead to precision issues when comparing Number values or outputting results. This affects the reliability of equality comparisons and the consistency of output data.

Float64 Precision Background

IEEE 754 float64 (double precision) has 52 mantissa bits (53 effective with implicit leading 1), which provides:

Property Value Meaning
log10(2^53) ≈ 15.95 Maximum significant decimal digits
DBL_DIG 15 Guaranteed decimal digits for round-trip (decimal → float64 → decimal)
DBL_DECIMAL_DIG 17 Digits needed for exact float64 → decimal → float64 round-trip

This means 15 significant digits is the correct upper bound for reliable precision in float64. Beyond 15 digits, floating-point representation noise appears.

Proposed Solution

1. Comparison Operators Changes

Modify the comparison operators that involve equality to use an absolute threshold for comparing Number values:

  • = (equal)
  • >= (greater than or equal)
  • <= (less than or equal)
  • between

The comparison should use significant digits tolerance to determine equality.

2. Output Formatting Changes

The engine output will use the float_format parameter in to_csv, using point notation to format Number values correctly.

Example format: ".15g" (general format with 15 significant digits)

3. Environment Variables

The behavior will be controlled by two environment variables:

COMPARISON_ABSOLUTE_THRESHOLD

Controls the significant digits used for Number comparison operations.

OUTPUT_NUMBER_SIGNIFICANT_DIGITS

Controls the significant digits used when formatting Number values in output (CSV export).

4. Environment Variable Values

Both variables accept the following values:

Value Behavior
None / Not defined Uses default value of 15 significant digits
6 to 15 Uses the specified number of significant digits
-1 Disables these changes (uses Python's default behavior)

The range 6–15 is based on float64 precision limits:

  • 6: Minimum practical precision (coarse tolerance)
  • 15: Maximum guaranteed precision for float64 (DBL_DIG)

Implementation Notes

  1. Significant Digits: The implementation must ensure we use significant digits (not decimal places) for both comparison and output formatting.

  2. Comparison Logic: For equality comparisons, two Number values should be considered equal if they match within the tolerance defined by the significant digits threshold.

  3. Output Format: Use Python's general format specifier (e.g., ".15g") which automatically handles significant digits and switches between fixed and exponential notation as appropriate.

  4. Backward Compatibility: Setting the environment variable to -1 should preserve the current behavior for users who depend on it.

Files to Modify

  • src/vtlengine/Operators/Comparison.py - Comparison operator implementations
  • src/vtlengine/API/__init__.py or output handling module - CSV output formatting
  • src/vtlengine/Utils/__init__.py - Environment variable reading utilities (if needed)

Testing

  • Add tests for Number comparisons at boundary conditions
  • Add tests for output formatting with various Number values
  • Add tests for environment variable configurations (default, custom values, disabled)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions