What is the problem the feature request solves?
The structured-text functions have no Comet implementation, so any query using them falls back to Spark for the enclosing operator:
- CSV:
from_csv, schema_of_csv
- JSON:
schema_of_json, json_object_keys
- XPath:
xpath, xpath_boolean/short/int/long/float/double/string
- XML (Spark 4.0+):
from_xml, to_xml, schema_of_xml
They are hard to implement natively in Rust (CSV/JSON/XML parsing with Spark-specific semantics).
Describe the potential solution
These all extend Spark's CodegenFallback, which the codegen dispatcher already admits (the same mechanism backing from_json/to_json). Routing them through the dispatcher keeps a top-level projection native while matching Spark exactly.
On Spark 3.4/3.5 they are plain expressions and can be registered directly in the serde maps. On Spark 4.x they are RuntimeReplaceable and the optimizer rewrites them to Invoke(evaluator) / StaticInvoke before Comet sees the plan, so they must be dispatched from the 4.x shim (mirroring how from_json/to_json/parse_url are already handled).
Additional context
Tier 2 of the codegen-dispatch expansion identified in #4616. Related: the HOF tier in #4618.
What is the problem the feature request solves?
The structured-text functions have no Comet implementation, so any query using them falls back to Spark for the enclosing operator:
from_csv,schema_of_csvschema_of_json,json_object_keysxpath,xpath_boolean/short/int/long/float/double/stringfrom_xml,to_xml,schema_of_xmlThey are hard to implement natively in Rust (CSV/JSON/XML parsing with Spark-specific semantics).
Describe the potential solution
These all extend Spark's
CodegenFallback, which the codegen dispatcher already admits (the same mechanism backingfrom_json/to_json). Routing them through the dispatcher keeps a top-level projection native while matching Spark exactly.On Spark 3.4/3.5 they are plain expressions and can be registered directly in the serde maps. On Spark 4.x they are
RuntimeReplaceableand the optimizer rewrites them toInvoke(evaluator)/StaticInvokebefore Comet sees the plan, so they must be dispatched from the 4.x shim (mirroring howfrom_json/to_json/parse_urlare already handled).Additional context
Tier 2 of the codegen-dispatch expansion identified in #4616. Related: the HOF tier in #4618.