What is the problem the feature request solves?
Spark's array and map higher-order (lambda) functions currently have no Comet implementation, so any query using them falls back to Spark for the enclosing operator:
- array:
transform, exists, forall, aggregate/reduce, array_sort (with comparator), zip_with
- map:
map_filter, transform_keys, transform_values, map_zip_with
These are hard to implement natively in Rust because they evaluate an arbitrary user lambda per element.
Describe the potential solution
The codegen dispatcher added for the regex/json families already admits CodegenFallback expressions, which includes all higher-order functions: CometBatchKernelCodegen.canHandle accepts them, and CometCodegenHOFSuite already proves transform/filter/aggregate/exists evaluate correctly inside the kernel when nested in a registered ScalaUDF.
Wiring each HOF into the serde as a CometCodegenDispatch makes a top-level HOF projection stay native (running Spark's own per-element evaluation inside the Comet kernel) and match Spark exactly, falling back cleanly when the dispatcher is disabled.
Additional context
Identified while reviewing the codegen-dispatch work in #4538. Related testing-convention follow-up: #4616.
What is the problem the feature request solves?
Spark's array and map higher-order (lambda) functions currently have no Comet implementation, so any query using them falls back to Spark for the enclosing operator:
transform,exists,forall,aggregate/reduce,array_sort(with comparator),zip_withmap_filter,transform_keys,transform_values,map_zip_withThese are hard to implement natively in Rust because they evaluate an arbitrary user lambda per element.
Describe the potential solution
The codegen dispatcher added for the regex/json families already admits
CodegenFallbackexpressions, which includes all higher-order functions:CometBatchKernelCodegen.canHandleaccepts them, andCometCodegenHOFSuitealready provestransform/filter/aggregate/existsevaluate correctly inside the kernel when nested in a registeredScalaUDF.Wiring each HOF into the serde as a
CometCodegenDispatchmakes a top-level HOF projection stay native (running Spark's own per-element evaluation inside the Comet kernel) and match Spark exactly, falling back cleanly when the dispatcher is disabled.Additional context
Identified while reviewing the codegen-dispatch work in #4538. Related testing-convention follow-up: #4616.