Background
Several expressions now have both a native (Rust) implementation and a JVM codegen-dispatch fallback (mixing in CodegenDispatchFallback). For these, the execution path is selected by a combination of the per-input support level and two config flags:
spark.comet.expression.<Name>.allowIncompatible
spark.comet.exec.scalaUDF.codegen.enabled (default true)
This produces a routing matrix with five outcomes:
| # |
Support level for the input |
allowIncompatible |
codegen.enabled |
Resulting path |
What a test should assert |
| 1 |
Compatible |
any |
any |
native Rust |
native + matches Spark |
| 2 |
Incompatible |
true |
any |
native Rust |
stays in Comet only (results may diverge) |
| 3 |
Incompatible |
false |
true (default) |
JVM codegen dispatch |
native + matches Spark |
| 4 |
Incompatible |
false |
false |
Spark fallback |
fallback reason + matches Spark |
| 5 |
Unsupported |
any |
any |
Spark fallback |
fallback reason + matches Spark |
Problem
Coverage of this matrix is currently ad hoc and inconsistent across expressions. date_format in CometTemporalExpressionSuite covers the full trio for rows 2/3/4, but most other dual-impl expressions only assert the default dispatch path (row 3). The fallback-when-dispatcher-disabled path (row 4) and the native-when-allowIncompatible path (row 2) are easy to drop silently, since a path-agnostic checkSparkAnswer passes regardless of which engine ran.
This was observed while reviewing #4538: tightening the NTZ hour/minute/second and non-UTC date_trunc tests to assert the dispatch path inadvertently removed any assertion that those cases still fall back correctly when the dispatcher is disabled. That specific gap was fixed in #4538, but the same gap likely exists for the other expression families that route through the dispatcher (string, collection, map, json, etc.).
Proposal
Establish a consistent testing convention for dual-impl (native + codegen-dispatch) expressions so every such expression covers the applicable rows of the matrix above. Likely shape:
- A small set of shared test helpers that encode the matrix once, for example:
checkNativeCompatible(sql) (row 1)
checkStaysNativeWhenAllowIncompatible(sql, exprName) (row 2, no equality)
checkCodegenDispatch(sql) (row 3)
checkFallbackWhenCodegenDisabled(sql) (row 4)
checkUnsupportedFallback(sql, reason) (row 5)
- Per-expression suites call the relevant helpers, so the intended path for each input shape is explicit and a missing quadrant is obvious.
Then retrofit the existing dual-impl expressions (those mixing in CodegenDispatchFallback) to use the convention.
Scope notes
Background
Several expressions now have both a native (Rust) implementation and a JVM codegen-dispatch fallback (mixing in
CodegenDispatchFallback). For these, the execution path is selected by a combination of the per-input support level and two config flags:spark.comet.expression.<Name>.allowIncompatiblespark.comet.exec.scalaUDF.codegen.enabled(defaulttrue)This produces a routing matrix with five outcomes:
allowIncompatiblecodegen.enabledProblem
Coverage of this matrix is currently ad hoc and inconsistent across expressions.
date_formatinCometTemporalExpressionSuitecovers the full trio for rows 2/3/4, but most other dual-impl expressions only assert the default dispatch path (row 3). The fallback-when-dispatcher-disabled path (row 4) and the native-when-allowIncompatible path (row 2) are easy to drop silently, since a path-agnosticcheckSparkAnswerpasses regardless of which engine ran.This was observed while reviewing #4538: tightening the NTZ
hour/minute/secondand non-UTCdate_trunctests to assert the dispatch path inadvertently removed any assertion that those cases still fall back correctly when the dispatcher is disabled. That specific gap was fixed in #4538, but the same gap likely exists for the other expression families that route through the dispatcher (string, collection, map, json, etc.).Proposal
Establish a consistent testing convention for dual-impl (native + codegen-dispatch) expressions so every such expression covers the applicable rows of the matrix above. Likely shape:
checkNativeCompatible(sql)(row 1)checkStaysNativeWhenAllowIncompatible(sql, exprName)(row 2, no equality)checkCodegenDispatch(sql)(row 3)checkFallbackWhenCodegenDisabled(sql)(row 4)checkUnsupportedFallback(sql, reason)(row 5)Then retrofit the existing dual-impl expressions (those mixing in
CodegenDispatchFallback) to use the convention.Scope notes