Skip to content

Route array/map higher-order (lambda) functions through the codegen dispatcher #4617

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Spark's array and map higher-order (lambda) functions currently have no Comet implementation, so any query using them falls back to Spark for the enclosing operator:

  • array: transform, exists, forall, aggregate/reduce, array_sort (with comparator), zip_with
  • map: map_filter, transform_keys, transform_values, map_zip_with

These are hard to implement natively in Rust because they evaluate an arbitrary user lambda per element.

Describe the potential solution

The codegen dispatcher added for the regex/json families already admits CodegenFallback expressions, which includes all higher-order functions: CometBatchKernelCodegen.canHandle accepts them, and CometCodegenHOFSuite already proves transform/filter/aggregate/exists evaluate correctly inside the kernel when nested in a registered ScalaUDF.

Wiring each HOF into the serde as a CometCodegenDispatch makes a top-level HOF projection stay native (running Spark's own per-element evaluation inside the Comet kernel) and match Spark exactly, falling back cleanly when the dispatcher is disabled.

Additional context

Identified while reviewing the codegen-dispatch work in #4538. Related testing-convention follow-up: #4616.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions