[GLUTEN-11917][VL] Respect allowPrecisionLoss from expression context in Spark 4.1#12110
Open
nookcreed wants to merge 2 commits into
Open
[GLUTEN-11917][VL] Respect allowPrecisionLoss from expression context in Spark 4.1#12110nookcreed wants to merge 2 commits into
nookcreed wants to merge 2 commits into
Conversation
… in Spark 4.1 In Spark 4.1, SPARK-53968 introduced NumericEvalContext which captures allowPrecisionLoss at analysis time and embeds it in arithmetic expressions (Add, Subtract, Multiply, Divide) via an evalContext field. Previously, Gluten read SQLConf.get.decimalOperationsAllowPrecisionLoss at plan-conversion time, which can diverge from the expression's captured value — for example, when querying a persistent view whose expression was analyzed under a different session config. This commit adds a shim method decimalAllowPrecisionLoss(expr: BinaryArithmetic) to SparkShims. The Spark41Shims override reads evalContext.allowDecimalPrecisionLoss directly from the expression. All other Spark versions fall back to SQLConf.get, preserving existing behavior. DecimalArithmeticUtil.getResultType and VeloxSparkPlanExecApi.getDecimalArithmeticExprName are updated to use the shim, ensuring the correct result type and Velox function name are selected. Fixes: apache#11917
68048cb to
3029fe8
Compare
|
Run Gluten Clickhouse CI on x86 |
2 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
…nLoss shim - Add regression test covering the view-analyzed-under-different-session-config scenario that motivated the fix: arithmetic is analyzed with allowPrecisionLoss=false, cached in a temp view, then queried with allowPrecisionLoss=true. Gluten must read the captured evalContext, not SQLConf.get. - Add comment on SparkShims default explaining why SQLConf.get is correct for pre-4.1 Spark versions (no evalContext field exists before SPARK-53968). - Add comment on Spark41Shims wildcard arm explaining that Remainder/Pmod lack evalContext and are gated out of Velox execution by GlutenNotSupportException in DecimalArithmeticUtil.getResultType, making the SQLConf fallback safe. - Add comment on SparkPlanExecApi default explaining why non-Velox backends (e.g. ClickHouse) correctly ignore allowPrecisionLoss — they do not use the _deny_precision_loss naming convention.
5519e32 to
429cb76
Compare
|
Run Gluten Clickhouse CI on x86 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
In Spark 4.1, SPARK-53968 introduced
NumericEvalContextwhich capturesallowPrecisionLossat analysis time and embeds it in arithmetic expressions (Add,Subtract,Multiply,Divide) via anevalContextfield.Previously, Gluten read
SQLConf.get.decimalOperationsAllowPrecisionLossat plan-conversion time. These can diverge — for example, when querying a persistent view whose expression was analyzed under a different sessionconfig. This causes a decimal scale mismatch (e.g.
DECIMAL(38,18) + DECIMAL(38,18)producing scale 18 instead of 17 — a 10× error).Fix: Add a shim method
decimalAllowPrecisionLoss(expr: BinaryArithmetic): Boolean:Spark41Shims: readsexpr.evalContext.allowDecimalPrecisionLossdirectly from the expressionSQLConf.get(preserving existing behavior)DecimalArithmeticUtil.getResultTypeandVeloxSparkPlanExecApi.getDecimalArithmeticExprNamenow use the shim so the correct result type and Velox native function name are selected.How was this patch tested?
Root cause scenario
The bug triggers when allowPrecisionLoss captured in an expression's NumericEvalContext at analysis time differs from the current SQLConf at plan-conversion time. The canonical case is querying a persistent view whose
decimal arithmetic was analyzed under one config:
-- View analyzed with allowPrecisionLoss=true (default)
-- DECIMAL(38,18) + DECIMAL(38,18) → Spark computes result type as DECIMAL(38,17)
CREATE VIEW v AS SELECT col1 + col2 AS result FROM t;
-- Query run in a session with allowPrecisionLoss=false
SET spark.sql.decimalOperations.allowPrecisionLoss=false;
-- Old code: reads SQLConf → uses false → computes DECIMAL(38,18), calls add_deny_precision_loss
-- New code: reads expr.evalContext → uses true → computes DECIMAL(38,17), calls add
SELECT * FROM v;
-- Scale mismatch: 18 vs 17 = 10× error
Existing test (no regression)
MathFunctionsValidateSuite#decimal arithmetic (backends-velox) already covers allowPrecisionLoss=true/false toggled within a single session via withSQLConf. This test continues to pass because expression creation and plan
conversion both happen inside the same config scope.
./dev/run-scala-test.sh
-Pjava-17,spark-4.1,scala-2.13,backends-velox,hadoop-3.3,spark-ut
-pl backends-velox
-s org.apache.gluten.functions.MathFunctionsValidateSuite
-t "decimal arithmetic"
Fixed test: GlutenSQLQuerySuite
The exclusion on line 1047 of VeloxTestSettings.scala (spark41) should be removed and the test should now pass:
./dev/run-scala-test.sh
-Pjava-17,spark-4.1,scala-2.13,backends-velox,hadoop-3.3,spark-ut
-pl gluten-ut/spark41
-s org.apache.spark.sql.GlutenSQLQuerySuite
-t "should be able to resolve a persistent view"
GlutenSimpleSQLViewSuite (partial — 1 of 2 failures)
GlutenSimpleSQLViewSuite is currently fully disabled (// TODO: 4.x enableSuite on line 701). This PR resolves 1 of its 2 failures; the remaining failure is tracked in #11912. The suite can be re-enabled with an exclusion for
the #11912 test once that test name is identified from a CI run.
CI
Trigger Velox CI on the PR to run the full spark-ut suite against Spark 4.1.
Closes #11917