Skip to content

Add explain to AnalyticsExecutionEngine#5442

Open
finnegancarroll wants to merge 1 commit into
opensearch-project:mainfrom
finnegancarroll:analytics-engine-explain
Open

Add explain to AnalyticsExecutionEngine#5442
finnegancarroll wants to merge 1 commit into
opensearch-project:mainfrom
finnegancarroll:analytics-engine-explain

Conversation

@finnegancarroll
Copy link
Copy Markdown
Contributor

@finnegancarroll finnegancarroll commented May 14, 2026

Description

Connects SQL plugin AnalyticsExecutionEngine explain API with analytics engine executeWithProfile path. Providing per execution stage profiling for queries executed on this endpoint.

Note that _explain API on a an analytics engine index will provide the logical plan, execute the query, and provide profiling information for each execution stage. In contrast non analytics engine indices will only return the logical plan.

Testing

1. Publish SQL Plugin to Maven Local

./gradlew :opensearch-sql-plugin:publishToMavenLocal

2. Start Single-Node Cluster

./gradlew run -Dsandbox.enabled=true \
  -PinstalledPlugins="['opensearch-job-scheduler:3.7.0.0-SNAPSHOT', 'arrow-flight-rpc', 'analytics-engine', 'parquet-data-format', 'analytics-backend-datafusion', 'analytics-backend-lucene', 'composite-engine', 'opensearch-sql-plugin:3.7.0.0-SNAPSHOT']" \
  -Dtests.jvm.argline="-Dopensearch.experimental.feature.pluggable.dataformat.enabled=true -Dopensearch.experimental.feature.transport.stream.enabled=true"

3. Create Parquet-Backed Index

curl -X PUT "localhost:9200/test_parquet" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index.number_of_shards": 2,
    "index.number_of_replicas": 0,
    "index.pluggable.dataformat.enabled": true,
    "index.pluggable.dataformat": "composite"
  },
  "mappings": {
    "properties": {
      "name": {"type": "keyword"},
      "age": {"type": "integer"},
      "score": {"type": "double"}
    }
  }
}'

4. Ingest Sample Data

curl -X POST "localhost:9200/test_parquet/_bulk" -H 'Content-Type: application/x-ndjson' -d'
{"index":{}}
{"name":"alice","age":30,"score":95.5}
{"index":{}}
{"name":"bob","age":25,"score":87.3}
{"index":{}}
{"name":"carol","age":35,"score":91.0}
'

curl -X POST "localhost:9200/test_parquet/_refresh"

5. Run Explain Query (SQL Plugin Path)

curl -s -X POST "localhost:9200/_plugins/_ppl/_explain" \
  -H 'Content-Type: application/json' \
  -d '{"query": "source = test_parquet | stats avg(score) by name"}'

Result

{
  "calcite": {
    "logical": "LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])\n  LogicalProject(avg(score)=[$1], name=[$0])\n    LogicalAggregate(group=[{0}], avg(score)=[AVG($1)])\n      LogicalProject(name=[$1], score=[$2])\n        LogicalTableScan(table=[[opensearch, test_parquet]])\n",
    "profile": {
      "queryId": "8ffee8b8-944b-4fcd-9f71-6f4601f8034b",
      "fullPlan": [
        "OpenSearchSort(fetch=[10000], viableBackends=[[datafusion]])",
        "  OpenSearchProject(avg(score)=[$1], name=[$0], viableBackends=[[datafusion]])",
        "    OpenSearchProject(name=[$0], avg(score)=[ANNOTATED_PROJECT_EXPR(id=2, backends=[datafusion], /($1, $2))], viableBackends=[[datafusion]])",
        "      OpenSearchAggregate(group=[{0}], agg#0=[SUM(...)], agg#1=[COUNT(...)], mode=[FINAL], viableBackends=[[datafusion]])",
        "        OpenSearchExchangeReducer(viableBackends=[[datafusion]])",
        "          OpenSearchAggregate(group=[{0}], agg#0=[SUM(...)], agg#1=[COUNT(...)], mode=[PARTIAL], viableBackends=[[datafusion]])",
        "            OpenSearchProject(name=[$1], score=[$2], viableBackends=[[datafusion]])",
        "              OpenSearchTableScan(table=[[test_parquet]], viableBackends=[[datafusion]])"
      ],
      "totalElapsedMs": 220,
      "stages": [
        {
          "stageId": 0,
          "executionType": "SHARD_FRAGMENT",
          "distribution": "SINGLETON",
          "state": "SUCCEEDED",
          "elapsedMs": 217,
          "rowsProcessed": 3,
          "tasksCompleted": 2,
          "tasksFailed": 0,
          "fragment": [
            "OpenSearchAggregate(..., mode=[PARTIAL], ...)",
            "  OpenSearchProject(name=[$1], score=[$2], ...)",
            "    OpenSearchTableScan(table=[[test_parquet]], ...)"
          ],
          "tasks": [
            {
              "partitionId": 0,
              "node": "953slSrWQiaMqNimQGKTfg/shard[0]",
              "state": "FINISHED",
              "elapsedMs": 216
            },
            {
              "partitionId": 1,
              "node": "953slSrWQiaMqNimQGKTfg/shard[1]",
              "state": "FINISHED",
              "elapsedMs": 206
            }
          ]
        },
        {
          "stageId": 1,
          "executionType": "COORDINATOR_REDUCE",
          "state": "SUCCEEDED",
          "elapsedMs": 3,
          "rowsProcessed": 0,
          "tasksCompleted": 0,
          "tasksFailed": 0,
          "fragment": [
            "OpenSearchSort(fetch=[10000], ...)",
            "  OpenSearchProject(avg(score)=[$1], name=[$0], ...)",
            "    OpenSearchAggregate(..., mode=[FINAL], ...)",
            "      OpenSearchExchangeReducer(...)",
            "        OpenSearchStageInputScan(childStageId=[0], ...)"
          ],
          "tasks": []
        }
      ]
    }
  }
}

Related Issues

N/A

Check List

- [ ] New functionality includes testing. <- Tested locally, no IT suite for analytics plugin at this time

  • New functionality has been documented. <- TODO as follow up if design does not change
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

PR Reviewer Guide 🔍

(Review updated until commit c73e747)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Query Execution Side Effect

The explain method now executes the query via executeWithProfile even when the user only requested an explain plan. This changes the semantics of explain from a read-only operation to one that processes data and consumes resources. If the query modifies state, performs expensive computations, or triggers side effects, calling explain will now trigger those unintended consequences.

planExecutor.executeWithProfile(
    plan,
    null,
    new ActionListener<>() {
      @Override
      public void onResponse(ProfiledResult result) {
        try {
          QueryProfile profile = result.profile();
          ExplainResponse response =
              new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, profile));
          listener.onResponse(ExplainResponse.normalizeLf(response));
        } catch (Exception e) {
          listener.onFailure(e);
        }
      }

      @Override
      public void onFailure(Exception e) {
        // Fall back to plan-only explain if profiling fails
        try {
          ExplainResponse response =
              new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, null));
          listener.onResponse(ExplainResponse.normalizeLf(response));
        } catch (Exception ex) {
          listener.onFailure(ex);
        }
      }
    });
Silent Failure Masking

In the onFailure handler, when profiling fails, the code falls back to returning a plan-only explain response. If the fallback itself throws an exception in normalizeLf, that exception is passed to listener.onFailure. However, the original profiling failure is lost. This makes debugging difficult because the root cause (why profiling failed) is replaced by a potentially unrelated normalization error.

public void onFailure(Exception e) {
  // Fall back to plan-only explain if profiling fails
  try {
    ExplainResponse response =
        new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, null));
    listener.onResponse(ExplainResponse.normalizeLf(response));
  } catch (Exception ex) {
    listener.onFailure(ex);
  }
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

PR Code Suggestions ✨

Latest suggestions up to c73e747

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Verify null context parameter

The second parameter to executeWithProfile is null, which may cause issues if the
method expects a valid context or configuration object. Verify that passing null is
intentional and supported by the API, or provide the appropriate context parameter.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [115-117]

 planExecutor.executeWithProfile(
     plan,
-    null,
+    context,
     new ActionListener<>() {
       @Override
       public void onResponse(ProfiledResult result) {
         try {
           QueryProfile profile = result.profile();
           ExplainResponse response =
               new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, profile));
           listener.onResponse(ExplainResponse.normalizeLf(response));
         } catch (Exception e) {
           listener.onFailure(e);
         }
       }
       ...
     });
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that null is passed as the second parameter to executeWithProfile, and recommends using the available context parameter instead. This is a valid concern as passing null could cause issues if the method expects a valid context object. However, without knowing the API contract, this may be intentional behavior.

Medium
General
Log profiling failure before fallback

The fallback error handling silently swallows the original profiling failure
exception e. Consider logging the original exception before falling back to ensure
debugging information is not lost when profiling fails.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [131-141]

 @Override
 public void onFailure(Exception e) {
   // Fall back to plan-only explain if profiling fails
+  logger.warn("Profiling failed, falling back to plan-only explain", e);
   try {
     ExplainResponse response =
         new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, null));
     listener.onResponse(ExplainResponse.normalizeLf(response));
   } catch (Exception ex) {
     listener.onFailure(ex);
   }
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that the original exception e is silently swallowed in the fallback handler. Adding logging would improve debugging capabilities. However, the improved code assumes a logger exists without verifying its declaration, and logging suggestions typically have moderate impact on code quality.

Low

Previous suggestions

Suggestions up to commit c73e747
CategorySuggestion                                                                                                                                    Impact
General
Log original profiling failure exception

The fallback error handling silently swallows the original profiling failure
exception e. Consider logging the original exception before falling back to ensure
debugging information is not lost, especially for production troubleshooting.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [131-141]

 @Override
 public void onFailure(Exception e) {
   // Fall back to plan-only explain if profiling fails
+  // TODO: Add logging for the profiling failure: e
   try {
     ExplainResponse response =
         new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, null));
     listener.onResponse(ExplainResponse.normalizeLf(response));
   } catch (Exception ex) {
     listener.onFailure(ex);
   }
 }
Suggestion importance[1-10]: 7

__

Why: Valid observation that the original exception e is not logged before falling back to plan-only explain. Logging would aid debugging, though the fallback mechanism itself is functional. The suggestion appropriately uses a TODO comment rather than implementing logging directly.

Medium
Verify null parameter handling

The second parameter null passed to executeWithProfile may cause issues if the
method expects a non-null value. Verify that passing null is intentional and
properly handled by the executeWithProfile method, or provide an appropriate default
value.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [115-118]

+// Verify null is acceptable or provide appropriate context
 planExecutor.executeWithProfile(
     plan,
-    null,
+    /* context or parameters */ null,
     new ActionListener<>() {
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that null is passed as the second parameter to executeWithProfile. However, it only asks for verification rather than identifying a concrete issue. Without knowing the method signature, this is a reasonable concern but may be intentional design.

Low
Suggestions up to commit f6de314
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate null parameter in executeWithProfile

The executeWithProfile method is called with null as the second parameter without
validation. If this parameter is required or if executeWithProfile doesn't handle
null gracefully, this could cause a NullPointerException or unexpected behavior.
Verify the method signature and pass an appropriate value or add null-safety checks.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [115-142]

+// Verify that executeWithProfile accepts null for the second parameter
+// or provide appropriate default value based on the method contract
 planExecutor.executeWithProfile(
     plan,
-    null,
+    /* provide appropriate parameter or verify null is acceptable */,
     new ActionListener<>() {
       @Override
       public void onResponse(ProfiledResult result) {
         try {
           QueryProfile profile = result.profile();
           ExplainResponse response =
               new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, profile));
           listener.onResponse(ExplainResponse.normalizeLf(response));
         } catch (Exception e) {
           listener.onFailure(e);
         }
       }
       ...
     });
Suggestion importance[1-10]: 5

__

Why: The suggestion asks to verify if null is acceptable for the second parameter of executeWithProfile. While this is a valid concern, the suggestion only requests verification without providing concrete evidence of an issue. The PR code may intentionally pass null if the method signature allows it.

Low
General
Add null safety for profile result

The result.profile() call could potentially return null, which would be passed to
ExplainResponseNodeV2. Add a null check to ensure the profile is valid before
constructing the response, or handle the null case explicitly to prevent potential
issues downstream.

core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [120-129]

 @Override
 public void onResponse(ProfiledResult result) {
   try {
-    QueryProfile profile = result.profile();
+    QueryProfile profile = result != null ? result.profile() : null;
     ExplainResponse response =
         new ExplainResponse(new ExplainResponseNodeV2(logical, null, null, profile));
     listener.onResponse(ExplainResponse.normalizeLf(response));
   } catch (Exception e) {
     listener.onFailure(e);
   }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion adds a null check for result before calling result.profile(), but the improved_code doesn't meaningfully change the behavior since profile would still be null if result is null. The ExplainResponseNodeV2 constructor already accepts null for the profile parameter, so this check provides minimal value.

Low

@finnegancarroll finnegancarroll changed the title Wire analytics engine explain to return stage profiling via executeWi… Add explain to AnalyticsExecutionEngine May 14, 2026
…thProfile

AnalyticsExecutionEngine.explain() now calls executeWithProfile on the
analytics engine's QueryPlanExecutor, executing the query and capturing
per-stage timing from the coordinator's perspective. The resulting
QueryProfile is attached to ExplainResponseNodeV2 and serialized in the
/_plugins/_ppl/_explain response.

Changes:
- ExplainResponseNodeV2: add QueryProfile field with backward-compatible
  3-arg constructor
- ExplainResponse.normalizeLf(): preserve profile field when normalizing
  line endings
- AnalyticsExecutionEngine.explain(): call executeWithProfile, attach
  profile to response. Falls back to plan-only on failure.

Signed-off-by: Finn Carroll <carrofin@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c73e747

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c73e747

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant