[analytics engine] Add /_plugins/_analytics/stats endpoint#21796
[analytics engine] Add /_plugins/_analytics/stats endpoint#21796OVI3D0 wants to merge 16 commits into
/_plugins/_analytics/stats endpoint#21796Conversation
Mirror the SQL plugin's /_explain URL-based routing pattern in the test-ppl-frontend harness. Add explain flag to PPLRequest and register the POST /_analytics/ppl/_explain route in RestPPLQueryAction. Signed-off-by: Finn Carroll <carrofin@amazon.com>
Rebase of the explain API feature onto the new scheduler architecture (post PR opensearch-project#21699). Key changes from the previous version: - Profile types (QueryProfile, StageProfile, TaskProfile, ProfiledResult) in analytics-api with executeWithProfile on QueryPlanExecutor interface - QueryProfileBuilder rewritten to use StageExecution.tasks() directly (no TaskTracker needed — tasks are accessible from the stage) - Scheduler.execute() returns QueryExecution for post-execution inspection - QueryExecution.getGraph() accessor added for profile snapshot - DefaultPlanExecutor: unified executeInternal with profile boolean, captures planning_time_ms and execution_time_ms - PPL frontend: /_analytics/ppl/_explain endpoint with full wiring - Integration tests (ExplainApiIT) Signed-off-by: Finn Carroll <carrofin@amazon.com>
Fixes missingJavadoc check for the QueryProfileBuilder package. Signed-off-by: Finn Carroll <carrofin@amazon.com>
Remove the dual execute/executeWithProfile paths. Always call executeWithProfile and conditionally include the profile in the response based on the explain flag. Eliminates code duplication in the test frontend. Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
Local IDE setting accidentally committed. Should not be in the repo. Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
- Remove partition_id (only relevant for future shuffle exchanges) - Remove start_ms/end_ms (redundant with elapsed_ms) - describeTarget returns node ID for all task types instead of '(local)' Task output now contains only: node, state, elapsed_ms. Signed-off-by: Finn Carroll <carrofin@amazon.com>
The dsl-query-executor tests use QueryPlanExecutor as a lambda (single abstract method). Making executeWithProfile abstract broke this. Restore the default implementation that throws UnsupportedOperationException — real implementations (DefaultPlanExecutor) still override it, but test lambdas that only need execute() continue to work. Signed-off-by: Finn Carroll <carrofin@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
Adds a default no-op callback that fires once per query immediately after the planning pipeline (Calcite optimization + DAG construction) completes, before any stage starts. This gives consumers a hook for distinguishing planner-bound queries from execution-bound queries — useful for observability rollups and diagnostic tooling that wants to attribute time spent in CBO versus time spent in stages. Adds matching dispatch in CompositeListener so the new event fans out to delegates with the same exception-isolation semantics as the existing callbacks. Existing implementations are unaffected because the method has a default no-op body. Signed-off-by: Michael Oviedo <mikeovi@amazon.com>
executeFragmentStreamingAsync was firing onPreFragmentExecution and onFragmentFailure but never onFragmentSuccess, leaving the success branch of the data-node-side fragment instrumentation silent. Wraps the stream-draining loop with a nanoTime timer and a row counter, then fires onFragmentSuccess once the stream completes successfully. This makes any AnalyticsOperationListener that consumes fragment events able to record wall-clock latency and rows produced per task. No behaviour change for the existing failure path — failures continue to route through executeFragmentStreaming's catch blocks. Signed-off-by: Michael Oviedo <mikeovi@amazon.com>
Adds a node-local rollup of analytics-engine activity, exposed at
GET /_plugins/_analytics/stats. Aggregates query, stage, and fragment
lifecycle events from AnalyticsOperationListener into counters surfaced
as JSON. Designed for oncall triage: when the cluster is slow, hit the
endpoint to see whether time is going into planning, into a particular
stage type, or into individual fragments — rather than chasing per-query
profiles for every request.
Output (per-node, since stats are recorded on whichever node handled the
event):
analytics:
queries: total / succeeded / failed / in_flight,
elapsed_ms_sum / max,
planning_ms_sum / max
stages_by_type: <ExecutionClassName>:
started / succeeded / failed / cancelled,
rows_processed_total,
elapsed_ms_sum / max
fragments: total / succeeded / failed,
elapsed_ms_sum / max
The listener-list machinery existed but had no producers wired in — both
QueryContext and AnalyticsSearchService were constructed with List.of()
in production. AnalyticsPlugin now owns a singleton listener registry,
registers the collector into it, and threads the list into both
QueryContext (coordinator-side) and AnalyticsSearchService (data-node-
side). DefaultPlanExecutor injects the registry via Guice and forwards
the list through to QueryContext.
DefaultPlanExecutor's planning-time timer is lifted out of the if(profile)
branch so it runs unconditionally, then fires onQueryPlanned on every
query. The existing _explain payload is unchanged — it still uses the
same timer.
Recording is wait-free: counters are LongAdder, max-update is a CAS loop,
the stage-type bucket map is a ConcurrentHashMap. snapshot() reads each
counter once into an immutable AnalyticsStats record and renders to JSON
via toXContent.
Per-node only for v1; cluster-wide aggregation via TransportNodesAction is
a follow-up. Counters + sum + max only; HdrHistogram percentiles are a
follow-up. The output types are marked @experimentalapi to signal that
field shapes can evolve.
Coverage: AnalyticsStatsCollectorTests exercises every callback path
including a concurrent-recording stress test; AnalyticsStatsApiIT
provisions a dataset, fires PPL queries, and verifies the endpoint
reflects the activity.
Signed-off-by: Michael Oviedo <mikeovi@amazon.com>
Latency fields under queries / stages_by_type / fragments now serialize as a LatencyStats object with count, sum_ms, max_ms, p50_ms, p95_ms, p99_ms instead of separate elapsed_ms_sum / elapsed_ms_max scalars. Same for the new planning_ms field. Backed by a small LatencyHistogram helper that wraps HdrHistogram's Recorder for wait-free recording across many writer threads. Each snapshot folds the latest interval into a long-lived cumulative accumulator, preserving the cumulative-since-startup semantics the rest of the rollup uses. count and sum_ms are tracked alongside via LongAdders since HdrHistogram only stores bucket midpoints and would lose exact totals. HdrHistogram is configured for 1ms..10min range at 3 significant digits (~0.1% relative precision). Values outside that range clamp. The dependency is already on the analytics-engine compile classpath (transitively from server), so no new license/notice paperwork. This makes oncall triage qualitatively better: a slow tail (p99 >> p50) now jumps out of the same response that already shows the totals, instead of needing a separate per-query profile to discover it. Signed-off-by: Michael Oviedo <mikeovi@amazon.com>
AbstractStageExecution.fireOperationListeners was passing getClass().getSimpleName() as the stageType to onStageStart and friends — that's the execution class name (e.g. "ShardFragmentStageExecution"), which is the wrong stable surface for an external rollup. The Stage's StageExecutionType enum (SHARD_FRAGMENT, COORDINATOR_REDUCE, LOCAL_PASSTHROUGH, LOCAL_COMPUTE) is the canonical name and is what PR opensearch-project#21660's StageProfile.execution_type already uses. Pass stage.getExecutionType().name() instead. The stats endpoint's stages_by_type buckets now key on the enum name, matching the explain API's field naming and surviving any future class-name refactor. Signed-off-by: Michael Oviedo <mikeovi@amazon.com>
The original test fired 6 identical queries and asserted on whichever node the REST client happened to land on. With only 2 of those queries landing on the same node, percentile spread was binary and fragments.total showed 0 when the snapshot landed on the coordinator-only node. Fire 90 queries across three shapes (project, filter, aggregate) so the per-stage-type buckets see varied work, then pick the busiest node's snapshot for assertions. All three buckets (queries, stages_by_type, fragments) reliably populate this way and percentile spread (p50 vs p99) is meaningful. Signed-off-by: Michael Oviedo <mikeovi@amazon.com>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
|
❌ Gradle check result for 74d5d76: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
Builds ontop of #21660 and adds per stage timing but rolled up so users won't need to hit
_explainfor every requestExample:
Some follow ups that can be added:
TransportNodesActionto broadcast/merge results, since right now this API is per-node onlyqueries.failed— query lifecycle events fire fromQueryScheduler.execute(), but parse / table-resolution / planning failures fail upstream of thatRelated Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.