Summary
Implement proper handling of VTL Number type for floating-point precision issues in comparison operators and output formatting.
Problem
Floating-point arithmetic in Python can lead to precision issues when comparing Number values or outputting results. This affects the reliability of equality comparisons and the consistency of output data.
Float64 Precision Background
IEEE 754 float64 (double precision) has 52 mantissa bits (53 effective with implicit leading 1), which provides:
| Property |
Value |
Meaning |
| log10(2^53) |
≈ 15.95 |
Maximum significant decimal digits |
| DBL_DIG |
15 |
Guaranteed decimal digits for round-trip (decimal → float64 → decimal) |
| DBL_DECIMAL_DIG |
17 |
Digits needed for exact float64 → decimal → float64 round-trip |
This means 15 significant digits is the correct upper bound for reliable precision in float64. Beyond 15 digits, floating-point representation noise appears.
Proposed Solution
1. Comparison Operators Changes
Modify the comparison operators that involve equality to use an absolute threshold for comparing Number values:
= (equal)
>= (greater than or equal)
<= (less than or equal)
between
The comparison should use significant digits tolerance to determine equality.
2. Output Formatting Changes
The engine output will use the float_format parameter in to_csv, using point notation to format Number values correctly.
Example format: ".15g" (general format with 15 significant digits)
3. Environment Variables
The behavior will be controlled by two environment variables:
COMPARISON_ABSOLUTE_THRESHOLD
Controls the significant digits used for Number comparison operations.
OUTPUT_NUMBER_SIGNIFICANT_DIGITS
Controls the significant digits used when formatting Number values in output (CSV export).
4. Environment Variable Values
Both variables accept the following values:
| Value |
Behavior |
None / Not defined |
Uses default value of 15 significant digits |
6 to 15 |
Uses the specified number of significant digits |
-1 |
Disables these changes (uses Python's default behavior) |
The range 6–15 is based on float64 precision limits:
- 6: Minimum practical precision (coarse tolerance)
- 15: Maximum guaranteed precision for float64 (DBL_DIG)
Implementation Notes
-
Significant Digits: The implementation must ensure we use significant digits (not decimal places) for both comparison and output formatting.
-
Comparison Logic: For equality comparisons, two Number values should be considered equal if they match within the tolerance defined by the significant digits threshold.
-
Output Format: Use Python's general format specifier (e.g., ".15g") which automatically handles significant digits and switches between fixed and exponential notation as appropriate.
-
Backward Compatibility: Setting the environment variable to -1 should preserve the current behavior for users who depend on it.
Files to Modify
src/vtlengine/Operators/Comparison.py - Comparison operator implementations
src/vtlengine/API/__init__.py or output handling module - CSV output formatting
src/vtlengine/Utils/__init__.py - Environment variable reading utilities (if needed)
Testing
- Add tests for Number comparisons at boundary conditions
- Add tests for output formatting with various Number values
- Add tests for environment variable configurations (default, custom values, disabled)
References
Summary
Implement proper handling of VTL Number type for floating-point precision issues in comparison operators and output formatting.
Problem
Floating-point arithmetic in Python can lead to precision issues when comparing Number values or outputting results. This affects the reliability of equality comparisons and the consistency of output data.
Float64 Precision Background
IEEE 754 float64 (double precision) has 52 mantissa bits (53 effective with implicit leading 1), which provides:
This means 15 significant digits is the correct upper bound for reliable precision in float64. Beyond 15 digits, floating-point representation noise appears.
Proposed Solution
1. Comparison Operators Changes
Modify the comparison operators that involve equality to use an absolute threshold for comparing Number values:
=(equal)>=(greater than or equal)<=(less than or equal)betweenThe comparison should use significant digits tolerance to determine equality.
2. Output Formatting Changes
The engine output will use the
float_formatparameter into_csv, using point notation to format Number values correctly.Example format:
".15g"(general format with 15 significant digits)3. Environment Variables
The behavior will be controlled by two environment variables:
COMPARISON_ABSOLUTE_THRESHOLDControls the significant digits used for Number comparison operations.
OUTPUT_NUMBER_SIGNIFICANT_DIGITSControls the significant digits used when formatting Number values in output (CSV export).
4. Environment Variable Values
Both variables accept the following values:
None/ Not defined6to15-1The range 6–15 is based on float64 precision limits:
Implementation Notes
Significant Digits: The implementation must ensure we use significant digits (not decimal places) for both comparison and output formatting.
Comparison Logic: For equality comparisons, two Number values should be considered equal if they match within the tolerance defined by the significant digits threshold.
Output Format: Use Python's general format specifier (e.g.,
".15g") which automatically handles significant digits and switches between fixed and exponential notation as appropriate.Backward Compatibility: Setting the environment variable to
-1should preserve the current behavior for users who depend on it.Files to Modify
src/vtlengine/Operators/Comparison.py- Comparison operator implementationssrc/vtlengine/API/__init__.pyor output handling module - CSV output formattingsrc/vtlengine/Utils/__init__.py- Environment variable reading utilities (if needed)Testing
References