PPL parser fails with "Cannot resolve function: DISTINCT_COUNT_APPROX" on
any execution path that does not construct OpenSearchExecutionEngine
(unified-query / analytics-engine / DataFusion), because the UDAF was
only registered to PPLFuncImpTable.aggExternalFunctionRegistry as a
side effect of that constructor.
Add a logical marker class DistinctCountApproxLogicalAggFunction in
core, expose it as PPLBuiltinOperators.DISTINCT_COUNT_APPROX, and register
it inside PPLFuncImpTable.AggBuilder.populate() alongside other built-in
aggregates. The marker has no JVM execution: init / add / result throw
UnsupportedOperationException, mirroring the pattern already used by
RelevanceQueryFunction.RelevanceQueryImplementor for match-family
functions which similarly have no JVM semantics.
Behavior on V3 path is preserved: OpenSearchExecutionEngine still
registers the real HyperLogLog++ DistinctCountApproxAggFunction in
aggExternalFunctionRegistry, and getImplementation() consults that
external registry first, so the marker is overridden whenever the V3
constructor has run. AggregateAnalyzer continues to translate the
operator to OpenSearch cardinality DSL via the BuiltinFunctionName
switch which is independent of the wrapped SqlAggFunction instance.
Operand metadata for the marker is null to match the existing external
registration's permissive type policy and avoid introducing new type
rejections that would surface as regressions in existing dc IT.
Signed-off-by: Songkan Tang <songkant@amazon.com>
Description
PPL parser fails with
Cannot resolve function: DISTINCT_COUNT_APPROXon any execution path that does not constructOpenSearchExecutionEngine(unified-query / analytics-engine / DataFusion), because the UDAF was only registered toPPLFuncImpTable.aggExternalFunctionRegistryas a side effect of that constructor (registerOpenSearchFunctions).This change adds a logical marker UDAF in
corethat lets PPL parser succeed on every path. Backends are expected to push down or rewrite the operator before execution.Layered registration
core(this PR): register a markerDistinctCountApproxLogicalAggFunctioninPPLFuncImpTable.AggBuilder.populate()via the newPPLBuiltinOperators.DISTINCT_COUNT_APPROX. Lookup precedence inPPLFuncImpTable.getImplementation()isaggExternalFunctionRegistryfirst, thenaggFunctionRegistry.opensearch(unchanged):OpenSearchExecutionEngine.registerOpenSearchFunctions()still installs the realDistinctCountApproxAggFunction(HyperLogLog++) intoaggExternalFunctionRegistry. Because external is consulted first, the V3 path always sees the real implementation once the constructor has run, and the marker is effectively overridden.This is the same pattern already used by
RelevanceQueryFunction.RelevanceQueryImplementorformatch/match_phrase/etc — relevance functions register a no-op operator whoseRelevanceQueryImplementor.implement()throwsUnsupportedOperationException, because their semantics live entirely on the OpenSearch index side.DISTINCT_COUNT_APPROXis in the same situation: real semantics on the OpenSearch side (cardinality aggregation) and on backends like DataFusion (approx_count_distinct).Marker class behavior
init/add/resultand the accumulator'svalueall throwUnsupportedOperationExceptionwith the message:Reaching the body means a backend either failed to push down or did not register an adapter — surfacing this as a clear error rather than silently producing wrong results is the intended behavior.
V3 path behavior is preserved
AggregateAnalyzertranslatesDISTINCT_COUNT_APPROXto OpenSearchcardinalityaggregation through aBuiltinFunctionNameswitch, independent of the wrappedSqlAggFunctioninstance — unchanged by this PR.RexImpTable.getAggImplementorreadsSqlUserDefinedAggFunction.function, which is the realDistinctCountApproxAggFunctionregistered byOpenSearchExecutionEngine(external takes precedence) — unchanged.nullto match the existing external registration's permissive type policy. No new type rejection is introduced.How this unblocks unified-query / DataFusion path
RestUnifiedQueryActiondoes not constructOpenSearchExecutionEngine; before this change,dc()/distinct_count()/distinct_count_approx()failed at PPL parse stage on that path. After this change, the parser succeeds via the marker, and the downstream backend (e.g. analytics-engine's DataFusion adapter) rewritesRexOver(DISTINCT_COUNT_APPROX)toAPPROX_COUNT_DISTINCTbefore substrait emission. End-to-end verified locally with the analytics-engine REST IT in the OpenSearch sandbox.Related Issues
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.