Skip to content

[GLUTEN-11917][VL] Respect allowPrecisionLoss from expression context in Spark 4.1#12110

Open
nookcreed wants to merge 2 commits into
apache:mainfrom
nookcreed:fix/gluten-11917-velox-decimal-allow-precision-loss
Open

[GLUTEN-11917][VL] Respect allowPrecisionLoss from expression context in Spark 4.1#12110
nookcreed wants to merge 2 commits into
apache:mainfrom
nookcreed:fix/gluten-11917-velox-decimal-allow-precision-loss

Conversation

@nookcreed
Copy link
Copy Markdown
Contributor

@nookcreed nookcreed commented May 18, 2026

What changes were proposed in this pull request?

In Spark 4.1, SPARK-53968 introduced NumericEvalContext which captures allowPrecisionLoss at analysis time and embeds it in arithmetic expressions (Add,
Subtract, Multiply, Divide) via an evalContext field.

Previously, Gluten read SQLConf.get.decimalOperationsAllowPrecisionLoss at plan-conversion time. These can diverge — for example, when querying a persistent view whose expression was analyzed under a different session
config. This causes a decimal scale mismatch (e.g. DECIMAL(38,18) + DECIMAL(38,18) producing scale 18 instead of 17 — a 10× error).

Fix: Add a shim method decimalAllowPrecisionLoss(expr: BinaryArithmetic): Boolean:

  • Spark41Shims: reads expr.evalContext.allowDecimalPrecisionLoss directly from the expression
  • All other Spark versions: fall back to SQLConf.get (preserving existing behavior)

DecimalArithmeticUtil.getResultType and VeloxSparkPlanExecApi.getDecimalArithmeticExprName now use the shim so the correct result type and Velox native function name are selected.

How was this patch tested?

Root cause scenario

The bug triggers when allowPrecisionLoss captured in an expression's NumericEvalContext at analysis time differs from the current SQLConf at plan-conversion time. The canonical case is querying a persistent view whose
decimal arithmetic was analyzed under one config:

-- View analyzed with allowPrecisionLoss=true (default)
-- DECIMAL(38,18) + DECIMAL(38,18) → Spark computes result type as DECIMAL(38,17)
CREATE VIEW v AS SELECT col1 + col2 AS result FROM t;

-- Query run in a session with allowPrecisionLoss=false
SET spark.sql.decimalOperations.allowPrecisionLoss=false;
-- Old code: reads SQLConf → uses false → computes DECIMAL(38,18), calls add_deny_precision_loss
-- New code: reads expr.evalContext → uses true → computes DECIMAL(38,17), calls add
SELECT * FROM v;
-- Scale mismatch: 18 vs 17 = 10× error

Existing test (no regression)

MathFunctionsValidateSuite#decimal arithmetic (backends-velox) already covers allowPrecisionLoss=true/false toggled within a single session via withSQLConf. This test continues to pass because expression creation and plan
conversion both happen inside the same config scope.

./dev/run-scala-test.sh
-Pjava-17,spark-4.1,scala-2.13,backends-velox,hadoop-3.3,spark-ut
-pl backends-velox
-s org.apache.gluten.functions.MathFunctionsValidateSuite
-t "decimal arithmetic"

Fixed test: GlutenSQLQuerySuite

The exclusion on line 1047 of VeloxTestSettings.scala (spark41) should be removed and the test should now pass:

./dev/run-scala-test.sh
-Pjava-17,spark-4.1,scala-2.13,backends-velox,hadoop-3.3,spark-ut
-pl gluten-ut/spark41
-s org.apache.spark.sql.GlutenSQLQuerySuite
-t "should be able to resolve a persistent view"

GlutenSimpleSQLViewSuite (partial — 1 of 2 failures)

GlutenSimpleSQLViewSuite is currently fully disabled (// TODO: 4.x enableSuite on line 701). This PR resolves 1 of its 2 failures; the remaining failure is tracked in #11912. The suite can be re-enabled with an exclusion for
the #11912 test once that test name is identified from a CI run.

CI

Trigger Velox CI on the PR to run the full spark-ut suite against Spark 4.1.


Closes #11917

… in Spark 4.1

In Spark 4.1, SPARK-53968 introduced NumericEvalContext which captures
allowPrecisionLoss at analysis time and embeds it in arithmetic expressions
(Add, Subtract, Multiply, Divide) via an evalContext field. Previously,
Gluten read SQLConf.get.decimalOperationsAllowPrecisionLoss at plan-conversion
time, which can diverge from the expression's captured value — for example,
when querying a persistent view whose expression was analyzed under a different
session config.

This commit adds a shim method decimalAllowPrecisionLoss(expr: BinaryArithmetic)
to SparkShims. The Spark41Shims override reads evalContext.allowDecimalPrecisionLoss
directly from the expression. All other Spark versions fall back to SQLConf.get,
preserving existing behavior. DecimalArithmeticUtil.getResultType and
VeloxSparkPlanExecApi.getDecimalArithmeticExprName are updated to use the shim,
ensuring the correct result type and Velox function name are selected.

Fixes: apache#11917
@nookcreed nookcreed force-pushed the fix/gluten-11917-velox-decimal-allow-precision-loss branch from 68048cb to 3029fe8 Compare May 18, 2026 20:45
@github-actions github-actions Bot added CORE works for Gluten Core VELOX labels May 18, 2026
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

2 similar comments
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

…nLoss shim

- Add regression test covering the view-analyzed-under-different-session-config
  scenario that motivated the fix: arithmetic is analyzed with allowPrecisionLoss=false,
  cached in a temp view, then queried with allowPrecisionLoss=true. Gluten must read
  the captured evalContext, not SQLConf.get.
- Add comment on SparkShims default explaining why SQLConf.get is correct for
  pre-4.1 Spark versions (no evalContext field exists before SPARK-53968).
- Add comment on Spark41Shims wildcard arm explaining that Remainder/Pmod lack
  evalContext and are gated out of Velox execution by GlutenNotSupportException
  in DecimalArithmeticUtil.getResultType, making the SQLConf fallback safe.
- Add comment on SparkPlanExecApi default explaining why non-Velox backends
  (e.g. ClickHouse) correctly ignore allowPrecisionLoss — they do not use the
  _deny_precision_loss naming convention.
@nookcreed nookcreed force-pushed the fix/gluten-11917-velox-decimal-allow-precision-loss branch from 5519e32 to 429cb76 Compare May 18, 2026 22:46
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@nookcreed nookcreed marked this pull request as ready for review May 19, 2026 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] Spark 4.1: Velox decimal arithmetic does not respect allowPrecisionLoss context (SPARK-53968)

1 participant