[analytics-engine] Coverage fixes for search command on the analytics-engine route by ahkcs · Pull Request #21681 · opensearch-project/OpenSearch

ahkcs · 2026-05-15T18:37:34Z

What this PR does

Four focused fixes in the analytics-engine sandbox plugins, narrowly scoped to the search command's analytics-engine route.

#	Commit	What it fixes
1	`Register TIMESTAMP in STANDARD_PROJECT_OPS`	PPL `timestamp(expr)` (also implicit when the analyzer coerces a string literal to TIMESTAMP for `@timestamp="..."` comparisons) was rejected by `OpenSearchProjectRule` even though `TimestampFunctionAdapter` already wires it to DataFusion's `to_timestamp`.
2	`Catch Calcite Litmus.THROW AssertionError in DefaultPlanExecutor`	Calcite's `Litmus.THROW` raises raw `AssertionError` from non-asserted Java. Without a catch, the analytics-engine executor lets it bubble to `OpenSearchUncaughtExceptionHandler` and kills the cluster JVM. Mirrors the SQL-plugin-side catch in `UnifiedQueryPlanner.plan`.
3	`Preserve fractional seconds in DatetimeOutputCastRewriter`	Widens the `to_char` format from `'%Y-%m-%d %H:%M:%S'` to `'%Y-%m-%d %H:%M:%S%.f'` so `date_nanos` columns keep sub-second precision in the response. Follow-up to #21650.
4	`Apply schema coercion on indexed-executor placeholder`	The indexed-execution path inferred the parquet schema via `build_segments` and registered it directly on the `PlaceholderProvider` that `from_substrait_plan` binds against — bypassing the `schema_coerce::coerce_inferred_schema` that the three other `infer_schema` sites (session_context, query_executor, api) all apply. So parquet's `BinaryView` for `ip` columns leaked to the substrait bind step, conflicting with isthmus's `Binary`. One-line fix: call `coerce_inferred_schema` on the schema returned by `build_segments`.

Why commit 4 matters

Without commit 4, every analytics-engine query against an OTEL-logs-style index (any mapping that includes an ip field) fails at fragment start with:

Substrait error: Field 'attributes.client.ip' in Substrait schema has a different type
(Binary) than the corresponding field in the table schema (BinaryView).

The Substrait plan declares Binary (isthmus has no view types), the table provider reports BinaryView (parquet's storage form). The three other infer_schema call sites coerce; only the indexed-executor path slipped. Adding the missing coerce call restores parity.

Pass / fail breakdown

CalciteSearchCommandIT on the analytics-engine route, against current upstream/main:

Step	PASS / 52
Pre-#21631 baseline	3
+ #21631 (IP/BINARY/MATCH_ONLY_TEXT scan + `schema_coerce` BinaryView↔Utf8)	~24
+ #21698 + #21628 + #21622 + #21562 (unsupported-type drop + PPL conversion + dedup + relevance functions on Lucene secondary)	27 (indexed-executor schema_coerce gap surfaces here)
+ Commit 4 (schema_coerce on indexed-executor)	34 (+7 — every OTEL-logs query that scans an `ip` column unblocked at the bind step)
+ Commits 1, 2, 3 (TIMESTAMP capability, Litmus catch, fractional seconds)	34 (no new passes; safety / precision improvements that the existing test queries don't directly assert on)
+ opensearch-project/sql#5447 (correctness fixes: datetime schema-type recovery + IP `byte[]` → `ExprIpValue`)	26

The drop from 34 → 26 with sql#5447 is not a regression: sql#5447's earlier iteration carried an SQL-side workaround (lower numeric / boolean predicates to typed RexCall to skip query_string) that lifted the same suite to 38 / 52. That workaround was deliberately removed in the latest sql#5447 revision because the root cause — Lucene-secondary query_string doesn't re-type literals inside the query body using the field mapping — should be fixed at the Lucene-secondary layer rather than in every consumer.

Remaining 25 failures — single upstream root cause, not blocking this PR

All 25 collapse to the typed-literal query_string gap on the analytics-engine Lucene-secondary path (adjacent to opensearch-project/OpenSearch#21562):

5 numeric / decimal range tests
10 date-math / date range tests
4 compound-predicate tests (mixed-AND, same family as Add Support for Relevance functions #21562's tests 10c / 10d that the author flagged as "handled separately")
2 attribute / multi-field tests
1 IP-via-query_string test
2 Lucene-secondary edge cases (wildcard pattern, NOT semantics)
1 composite-engine empty-mapping streaming test

See opensearch-project/sql#5447 for the per-test bucketing and a minimal smoking-gun repro:

where query_string([severityText], "ERROR")               # ✅ works
where query_string([severityNumber], "severityNumber:>15") # ❌ 0 rows (expected 6)

Testing

Sandbox-wide check: ./gradlew check -p sandbox -Dsandbox.enabled=true — see CI
CalciteSearchCommandIT on analytics route: 26 / 52 pass with this PR + sql#5447 against current upstream/main

github-actions · 2026-05-15T18:38:32Z

PR Reviewer Guide 🔍

(Review updated until commit `df38aab`)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

github-actions · 2026-05-15T19:39:32Z

✅ Gradle check result for ca67f37: SUCCESS

codecov · 2026-05-15T19:40:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.45%. Comparing base (debf6ed) to head (df38aab).

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #21681      +/-   ##
============================================
+ Coverage     73.33%   73.45%   +0.12%     
- Complexity    74996    75040      +44     
============================================
  Files          6012     6012              
  Lines        340934   340934              
  Branches      49076    49076              
============================================
+ Hits         250021   250441     +420     
+ Misses        71005    70484     -521     
- Partials      19908    20009     +101

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Previously `visitSearch` discarded `Search.getOriginalExpression()` and emitted a single `query_string(...)` function call covering the whole predicate. On the analytics-engine route every such filter is delegation-routed to Lucene; on a parquet-only composite shard the delegation handle NPEs in `LuceneAnalyticsBackendPlugin.getFilterDelegationHandle` because there is no Lucene reader. Walk the typed `SearchExpression` AST instead. Structured fragments — field comparisons, AND, OR, NOT, IN, plus time-modifier-resolved comparisons (`earliest=` / `latest=`) — become native PPL filter AST (`Compare` / `And` / `Or` / `Not` / `In`) that the existing rexVisitor handles like any `where`-clause condition. DataFusion executes them natively against parquet without any Lucene round-trip. Free-text / phrase literals stay in `query_string` form because they have no native equivalent. Top-level `AND` mixes the two: the structured conjunct is lowered, the relevance conjunct stays in `query_string`, and the resulting `AND(native, query_string(...))` lets the planner route each clause independently — DataFusion prunes parquet, Lucene handles the relevance term against the secondary format. Three guards keep the lowering faithful to PPL search semantics: - `containsLuceneWildcard` — `*` / `?` in a comparison value forces the fallback so Lucene wildcards keep working (`severityText=ERR*`). - `isOpenSearchDateMath` — values produced by `visitTimeModifierExpression` (`now`, `now-1h`, `now+1y/M-1M`, epoch-millis strings) can't be parsed as Calcite timestamps; keep them in `query_string` so Lucene evaluates the date math. - `isLowerableField` — comparisons against fields absent from the relation's row type fall through (matches Lucene's "missing field → no docs" semantics; native would error out in the analyzer). `SearchNot` always falls back: Lucene `NOT field=value` matches docs where the field is missing OR field != value, but `NOT (field = value)` in Calcite simplifies to `field <> value`, which drops missing-field rows since `NULL <> value` evaluates to NULL. Net `CalciteSearchCommandIT` impact (analytics-engine route, all OS- side fixes in `opensearch-project/OpenSearch#21681` in place): 3/52 → 35/52. Tests that pass via the native path include all field-equality, AND, OR, IN, range, double-comparison, attribute-field, and mixed boolean tests. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-05-15T21:16:59Z

Persistent review updated to latest commit 821fcd0

Documents the 3 → 35 / 52 progression for `CalciteSearchCommandIT` on the analytics-engine route across this PR + opensearch-project/OpenSearch#21681, plus the well-bucketed remaining 17 failures and where each one's fix lands. Mirrors the existing per-command status docs for stats, top/rare, union, and appendcol. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-05-15T21:17:19Z

PR Code Suggestions ✨

Latest suggestions up to df38aab

Explore these optional code suggestions:

Category Suggestion Impact

Security

Validate AssertionError source before catching

Catching AssertionError is generally unsafe because it can mask critical JVM-level
failures (e.g., OutOfMemoryError subclasses). Consider catching only specific
Calcite-thrown assertion errors or wrapping the Calcite planning phase separately to
isolate these expected assertion failures from true JVM errors.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [121-128]

 } catch (AssertionError e) {
-    // Calcite's Litmus.THROW (used by RelOptUtil.eq, RexUtil.isFlat, Project.isValid,
-    // RexChecker) throws AssertionError directly via Java code rather than via the
-    // `assert` keyword, so JVM -da doesn't gate them. If one fires inside this
-    // executor, OpenSearchUncaughtExceptionHandler exits the cluster JVM. Convert to
-    // an IllegalStateException so the query path treats it as a per-query failure
-    // (HTTP 500 with a bucketable message) instead of cluster-fatal.
-    listener.onFailure(new IllegalStateException("Analytics-engine executor rejected the plan: " + e.getMessage(), e));
+    // Only convert Calcite-specific assertion errors; rethrow others
+    if (e.getStackTrace().length > 0 && 
+        e.getStackTrace()[0].getClassName().startsWith("org.apache.calcite")) {
+        listener.onFailure(new IllegalStateException("Analytics-engine executor rejected the plan: " + e.getMessage(), e));
+    } else {
+        throw e;
+    }
 }

Suggestion importance[1-10]: 4

__

Why: While the concern about catching AssertionError is valid, the suggested stack trace inspection is fragile and may not reliably distinguish Calcite errors from JVM-level failures. The PR's approach is reasonable given the specific Calcite behavior documented in the comment, though a more robust solution would be preferable.

Low

Previous suggestions

Suggestions up to commit f8fee15

Category Suggestion Impact

Possible issue

Validate AssertionError source before catching

Catching AssertionError is risky because it can mask critical JVM-level failures
(e.g., OutOfMemoryError subclasses). Consider catching only specific Calcite
assertion types or wrapping the executor logic in a more targeted exception handler
that validates the error source before converting to IllegalStateException.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [113-120]

 } catch (AssertionError e) {
-    // Calcite's Litmus.THROW (used by RelOptUtil.eq, RexUtil.isFlat, Project.isValid,
-    // RexChecker) throws AssertionError directly via Java code rather than via the
-    // `assert` keyword, so JVM -da doesn't gate them. If one fires inside this
-    // executor, OpenSearchUncaughtExceptionHandler exits the cluster JVM. Convert to
-    // an IllegalStateException so the query path treats it as a per-query failure
-    // (HTTP 500 with a bucketable message) instead of cluster-fatal.
-    listener.onFailure(new IllegalStateException("Analytics-engine executor rejected the plan: " + e.getMessage(), e));
+    // Only convert Calcite-specific assertions; rethrow others to preserve JVM-fatal signals
+    if (e.getStackTrace().length > 0 && e.getStackTrace()[0].getClassName().startsWith("org.apache.calcite")) {
+        listener.onFailure(new IllegalStateException("Analytics-engine executor rejected the plan: " + e.getMessage(), e));
+    } else {
+        throw e;
+    }
 }

Suggestion importance[1-10]: 4

__

Why: While the concern about catching AssertionError is valid, the suggested stack trace inspection is fragile and may not reliably distinguish Calcite assertions from JVM-fatal errors. The current approach is reasonable given the specific Calcite context mentioned in the comment, though a more robust solution could be considered.

Low

Suggestions up to commit b920fac

Category	Suggestion	Impact
Possible issue	Avoid catching all AssertionErrors Catching `AssertionError` is risky because it can mask critical JVM-level failures (e.g., OutOfMemoryError subclasses). Consider catching only specific Calcite assertion errors or using a more targeted approach to avoid suppressing genuine JVM errors that should terminate the process. sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [113-120] } catch (AssertionError e) { - // Calcite's Litmus.THROW (used by RelOptUtil.eq, RexUtil.isFlat, Project.isValid, - // RexChecker) throws AssertionError directly via Java code rather than via the - // `assert` keyword, so JVM -da doesn't gate them. If one fires inside this - // executor, OpenSearchUncaughtExceptionHandler exits the cluster JVM. Convert to - // an IllegalStateException so the query path treats it as a per-query failure - // (HTTP 500 with a bucketable message) instead of cluster-fatal. - listener.onFailure(new IllegalStateException("Analytics-engine executor rejected the plan: " + e.getMessage(), e)); + // Only convert Calcite-specific assertion errors to query failures + if (e.getStackTrace().length > 0 && + e.getStackTrace()[0].getClassName().startsWith("org.apache.calcite")) { + listener.onFailure(new IllegalStateException("Analytics-engine executor rejected the plan: " + e.getMessage(), e)); + } else { + throw e; + } } Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a risk in catching all `AssertionError` instances, which could mask critical JVM failures. However, the proposed stack trace inspection approach is fragile and may not catch all Calcite assertions. A more robust solution would involve Calcite-specific exception types if available.	Medium
General	Document or remove empty loop The empty loop body suggests incomplete implementation. If this loop is intentionally empty for debugging or placeholder purposes, add a comment explaining why. Otherwise, implement the intended logic or remove the unused loop. sandbox/plugins/analytics-backend-datafusion/rust/src/indexed_executor.rs [458-459] let schema = crate::schema_coerce::coerce_inferred_schema(schema); +// Segment validation loop - currently no per-segment processing needed for (i, seg) in segments.iter().enumerate() { + // Reserved for future segment validation logic } Suggestion importance[1-10]: 3 __ Why: The suggestion identifies an empty loop that appears incomplete. While adding documentation would improve code clarity, this is a minor maintainability issue. The loop may be intentionally empty or a placeholder for future logic, but without more context it's difficult to assess if removal is appropriate.	Low

Suggestions up to commit ae47f33

Category	Suggestion	Impact
Possible issue	Validate AssertionError source before catching Catching `AssertionError` is risky because it can mask critical JVM-level failures (e.g., `OutOfMemoryError` subclasses). Consider catching only Calcite-specific assertion errors by checking the stack trace or exception source, or document that this catch block should only handle planning-phase assertions. sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [113-120] } catch (AssertionError e) { // Calcite's Litmus.THROW (used by RelOptUtil.eq, RexUtil.isFlat, Project.isValid, // RexChecker) throws AssertionError directly via Java code rather than via the // `assert` keyword, so JVM -da doesn't gate them. If one fires inside this // executor, OpenSearchUncaughtExceptionHandler exits the cluster JVM. Convert to // an IllegalStateException so the query path treats it as a per-query failure // (HTTP 500 with a bucketable message) instead of cluster-fatal. - listener.onFailure(new IllegalStateException("Analytics-engine executor rejected the plan: " + e.getMessage(), e)); + if (e.getStackTrace().length > 0 && e.getStackTrace()[0].getClassName().startsWith("org.apache.calcite")) { + listener.onFailure(new IllegalStateException("Analytics-engine executor rejected the plan: " + e.getMessage(), e)); + } else { + throw e; + } } Suggestion importance[1-10]: 4 __ Why: While the concern about catching `AssertionError` is valid, the suggested stack trace check is fragile and could miss Calcite errors from nested calls. The PR's approach is reasonable given the documented context that Calcite's `Litmus.THROW` specifically throws `AssertionError` in planning code. The suggestion adds complexity without significantly improving safety.	Low

Suggestions up to commit 672aa21

Category	Suggestion	Impact
General	Log AssertionError before conversion Catching `AssertionError` is risky because it can mask critical JVM-level failures (e.g., OutOfMemoryError subclasses). Consider catching only specific Calcite assertion errors or logging the error before conversion to ensure visibility of unexpected assertion failures. sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [116-123] } catch (AssertionError e) { // Calcite's Litmus.THROW (used by RelOptUtil.eq, RexUtil.isFlat, Project.isValid, // RexChecker) throws AssertionError directly via Java code rather than via the // `assert` keyword, so JVM -da doesn't gate them. If one fires inside this // executor, OpenSearchUncaughtExceptionHandler exits the cluster JVM. Convert to // an IllegalStateException so the query path treats it as a per-query failure // (HTTP 500 with a bucketable message) instead of cluster-fatal. + logger.error("AssertionError caught in analytics-engine executor", e); listener.onFailure(new IllegalStateException("Analytics-engine executor rejected the plan: " + e.getMessage(), e)); } Suggestion importance[1-10]: 5 __ Why: Adding logging before converting `AssertionError` to `IllegalStateException` improves observability and debugging. However, the comment already explains the specific Calcite assertions being caught, and the concern about masking critical JVM failures is somewhat mitigated by the narrow context. The suggestion provides moderate value for operational visibility.	Low

Suggestions up to commit 1b2d651

Category	Suggestion	Impact
Possible issue	Validate AssertionError source before catching Catching `AssertionError` is risky because it can mask critical JVM-level failures (e.g., OutOfMemoryError subclasses). Consider catching only Calcite-specific assertion errors by checking the stack trace or exception source, or document that this catch block should only handle planning-phase assertions. sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/DefaultPlanExecutor.java [116-123] } catch (AssertionError e) { // Calcite's Litmus.THROW (used by RelOptUtil.eq, RexUtil.isFlat, Project.isValid, // RexChecker) throws AssertionError directly via Java code rather than via the // `assert` keyword, so JVM -da doesn't gate them. If one fires inside this // executor, OpenSearchUncaughtExceptionHandler exits the cluster JVM. Convert to // an IllegalStateException so the query path treats it as a per-query failure // (HTTP 500 with a bucketable message) instead of cluster-fatal. - listener.onFailure(new IllegalStateException("Analytics-engine executor rejected the plan: " + e.getMessage(), e)); + if (isCalciteAssertion(e)) { + listener.onFailure(new IllegalStateException("Analytics-engine executor rejected the plan: " + e.getMessage(), e)); + } else { + throw e; + } } Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a risk in catching `AssertionError` broadly, as it could mask critical JVM failures. However, the `improved_code` introduces an undefined `isCalciteAssertion(e)` method without implementation, and the comment already documents the specific Calcite use case. The concern is valid but the solution needs refinement.	Medium

github-actions · 2026-05-15T21:52:07Z

Persistent review updated to latest commit be39849

Previously `visitSearch` discarded `Search.getOriginalExpression()` and emitted a single `query_string(...)` function call covering the whole predicate. On the analytics-engine route every such filter is delegation-routed to Lucene; on a parquet-only composite shard the delegation handle NPEs in `LuceneAnalyticsBackendPlugin.getFilterDelegationHandle` because there is no Lucene reader. Walk the typed `SearchExpression` AST instead. Structured fragments — field comparisons, AND, OR, NOT, IN, plus time-modifier-resolved comparisons (`earliest=` / `latest=`) — become native PPL filter AST (`Compare` / `And` / `Or` / `Not` / `In`) that the existing rexVisitor handles like any `where`-clause condition. DataFusion executes them natively against parquet without any Lucene round-trip. Free-text / phrase literals stay in `query_string` form because they have no native equivalent. Top-level `AND` mixes the two: the structured conjunct is lowered, the relevance conjunct stays in `query_string`, and the resulting `AND(native, query_string(...))` lets the planner route each clause independently — DataFusion prunes parquet, Lucene handles the relevance term against the secondary format. Three guards keep the lowering faithful to PPL search semantics: - `containsLuceneWildcard` — `*` / `?` in a comparison value forces the fallback so Lucene wildcards keep working (`severityText=ERR*`). - `isOpenSearchDateMath` — values produced by `visitTimeModifierExpression` (`now`, `now-1h`, `now+1y/M-1M`, epoch-millis strings) can't be parsed as Calcite timestamps; keep them in `query_string` so Lucene evaluates the date math. - `isLowerableField` — comparisons against fields absent from the relation's row type fall through (matches Lucene's "missing field → no docs" semantics; native would error out in the analyzer). `SearchNot` always falls back: Lucene `NOT field=value` matches docs where the field is missing OR field != value, but `NOT (field = value)` in Calcite simplifies to `field <> value`, which drops missing-field rows since `NULL <> value` evaluates to NULL. Net `CalciteSearchCommandIT` impact (analytics-engine route, all OS- side fixes in `opensearch-project/OpenSearch#21681` in place): 3/52 → 35/52. Tests that pass via the native path include all field-equality, AND, OR, IN, range, double-comparison, attribute-field, and mixed boolean tests. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-05-15T22:39:20Z

Persistent review updated to latest commit 64127e6

github-actions · 2026-05-15T23:39:01Z

❕ Gradle check result for 64127e6: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions · 2026-05-18T17:35:53Z

Persistent review updated to latest commit 1b2d651

github-actions · 2026-05-18T18:11:12Z

Persistent review updated to latest commit 672aa21

github-actions · 2026-05-18T19:08:28Z

❌ Gradle check result for 672aa21: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-05-18T21:14:08Z

Persistent review updated to latest commit ae47f33

github-actions · 2026-05-18T21:44:06Z

Persistent review updated to latest commit b920fac

Previously `visitSearch` discarded `Search.getOriginalExpression()` and emitted a single `query_string(...)` function call covering the whole predicate. On the analytics-engine route every such filter is delegation-routed to Lucene; on a parquet-only composite shard the delegation handle NPEs in `LuceneAnalyticsBackendPlugin.getFilterDelegationHandle` because there is no Lucene reader. Walk the typed `SearchExpression` AST instead. Structured fragments — field comparisons, AND, OR, NOT, IN, plus time-modifier-resolved comparisons (`earliest=` / `latest=`) — become native PPL filter AST (`Compare` / `And` / `Or` / `Not` / `In`) that the existing rexVisitor handles like any `where`-clause condition. DataFusion executes them natively against parquet without any Lucene round-trip. Free-text / phrase literals stay in `query_string` form because they have no native equivalent. Top-level `AND` mixes the two: the structured conjunct is lowered, the relevance conjunct stays in `query_string`, and the resulting `AND(native, query_string(...))` lets the planner route each clause independently — DataFusion prunes parquet, Lucene handles the relevance term against the secondary format. Three guards keep the lowering faithful to PPL search semantics: - `containsLuceneWildcard` — `*` / `?` in a comparison value forces the fallback so Lucene wildcards keep working (`severityText=ERR*`). - `isOpenSearchDateMath` — values produced by `visitTimeModifierExpression` (`now`, `now-1h`, `now+1y/M-1M`, epoch-millis strings) can't be parsed as Calcite timestamps; keep them in `query_string` so Lucene evaluates the date math. - `isLowerableField` — comparisons against fields absent from the relation's row type fall through (matches Lucene's "missing field → no docs" semantics; native would error out in the analyzer). `SearchNot` always falls back: Lucene `NOT field=value` matches docs where the field is missing OR field != value, but `NOT (field = value)` in Calcite simplifies to `field <> value`, which drops missing-field rows since `NULL <> value` evaluates to NULL. Net `CalciteSearchCommandIT` impact (analytics-engine route, all OS- side fixes in `opensearch-project/OpenSearch#21681` in place): 3/52 → 35/52. Tests that pass via the native path include all field-equality, AND, OR, IN, range, double-comparison, attribute-field, and mixed boolean tests. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-05-18T22:01:01Z

❌ Gradle check result for b920fac: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-05-19T18:11:17Z

Persistent review updated to latest commit f8fee15

…_OPS PPL `timestamp(expr)` lowers to a `ScalarFunction.TIMESTAMP` call. The TimestampFunctionAdapter already wires the call into DataFusion's native `to_timestamp`, but `STANDARD_PROJECT_OPS` did not declare TIMESTAMP, so `OpenSearchProjectRule.annotateExpr` rejected every plan that contained the operator with "No backend supports scalar function [TIMESTAMP] among [datafusion]". Same call also shows up implicitly after the analyzer coerces a string literal to TIMESTAMP for column comparisons such as `@timestamp="2024-01-15T10:30:00Z"` once `@timestamp` is typed as TIMESTAMP (see the date_nanos schema fix in the previous commit). Update the inline comment block to match: the legacy-engine-only path the prior comment described is gone now that the capability is wired. Signed-off-by: Kai Huang <ahkcs@amazon.com>

…ltPlanExecutor RexUtil.isFlat / RelOptUtil.eq / Project.isValid / RexChecker call into Calcite's Litmus.THROW, which raises AssertionError from raw Java code rather than via the `assert` keyword. JVM `-da` doesn't gate that path, so an assertion firing inside a search thread escapes to OpenSearchUncaughtExceptionHandler and exits the cluster JVM. This bit hard once `CalciteRelNodeVisitor` started lowering structured PPL `search` predicates to native filter shape: queries like `severityNumber="not-a-number"` fold to `=(SAFE_CAST($X), null)` ahead of the marking phase, and the Litmus check fires before any plan-executor listener gets to translate the error. The cluster died mid-IT and 21 subsequent tests failed with `Connection refused`. Convert AssertionError caught at the executor entrypoint to an IllegalStateException so the query reports as HTTP 500 with a bucketable message and the cluster survives. The same pattern is already in place at `UnifiedQueryPlanner.plan` on the SQL plugin side; this is the analytics-engine-side mirror so neither layer can produce a cluster-fatal assertion. Signed-off-by: Kai Huang <ahkcs@amazon.com>

…eOutputCastRewriter format Widen the format string passed to `to_char` from the seconds-only {@code "%Y-%m-%d %H:%M:%S"} to {@code "%Y-%m-%d %H:%M:%S%.f"}. The trailing {@code %.f} is chrono's variable-length fractional-second specifier — a leading dot followed by 0-9 digits, omitted when the value has no sub-second precision. This matches PPL's legacy formatting for {@code date} and {@code date_nanos} fields where the displayed precision tracks the source value: - {@code 2024-01-15T10:30:01.23456789Z} (date_nanos) → {@code "2024-01-15 10:30:01.23456789"} (legacy) / now {@code "2024-01-15 10:30:01.234567890"} (analytics route — internally 9-digit, leading 0 because Arrow Timestamp(ns) precision is fixed) - {@code 2025-08-01T03:47:41Z} (date) → {@code "2025-08-01 03:47:41"} (both paths) — no decimal because the source value has no fractional component Surfaced by `CalciteSearchCommandIT.testSearchWithDateRangeComparisons`. Follow-up to opensearch-project/sql#5420 which the original PR (opensearch-project#21650) closed with a seconds-only format that dropped the fractional digits the tests still expect. Signed-off-by: Kai Huang <ahkcs@amazon.com>

…tor placeholder The indexed-execution path inferred the parquet schema via build_segments (which calls FileFormat::infer_schema) and registered it directly on the PlaceholderProvider that from_substrait_plan binds against. The non-indexed paths (session_context.rs, query_executor.rs, api.rs) all routed their inferred schemas through schema_coerce::coerce_inferred_schema before registering, but the indexed path skipped this step. Result: an OpenSearch `ip` column lands as parquet `BinaryView` on disk; isthmus on the Java side serializes the Substrait base schema as plain `Binary` (Substrait has no view types). The placeholder reports `BinaryView` while the plan declares `Binary` — DataFusion's substrait consumer rejects the bind with: Substrait error: Field '<x>' in Substrait schema has a different type (Binary) than the corresponding field in the table schema (BinaryView). Every analytics-engine query against an index that includes an `ip` column (every OTEL-logs query in CalciteSearchCommandIT, for example) fails at fragment start. Apply coerce_inferred_schema right after build_segments, before PlaceholderProvider construction. The placeholder, the expr_to_bool_tree analysis, and the downstream IndexedTableProvider all see the same Substrait-compatible (Binary / Int64 / Float32) schema, so the bind succeeds and the parquet reader's SchemaAdapter handles the per-batch BinaryView→Binary relabeling at scan time. This restores parity with the other infer_schema sites; no behavior change for non-IP columns. Signed-off-by: Kai Huang <ahkcs@amazon.com>

github-actions · 2026-05-19T18:18:50Z

Persistent review updated to latest commit df38aab

github-actions · 2026-05-19T19:13:51Z

✅ Gradle check result for df38aab: SUCCESS

…-engine route (opensearch-project#21681) * [analytics-backend-datafusion] Register TIMESTAMP in STANDARD_PROJECT_OPS PPL `timestamp(expr)` lowers to a `ScalarFunction.TIMESTAMP` call. The TimestampFunctionAdapter already wires the call into DataFusion's native `to_timestamp`, but `STANDARD_PROJECT_OPS` did not declare TIMESTAMP, so `OpenSearchProjectRule.annotateExpr` rejected every plan that contained the operator with "No backend supports scalar function [TIMESTAMP] among [datafusion]". Same call also shows up implicitly after the analyzer coerces a string literal to TIMESTAMP for column comparisons such as `@timestamp="2024-01-15T10:30:00Z"` once `@timestamp` is typed as TIMESTAMP (see the date_nanos schema fix in the previous commit). Update the inline comment block to match: the legacy-engine-only path the prior comment described is gone now that the capability is wired. Signed-off-by: Kai Huang <ahkcs@amazon.com> * [analytics-engine] Catch Calcite Litmus.THROW AssertionError in DefaultPlanExecutor RexUtil.isFlat / RelOptUtil.eq / Project.isValid / RexChecker call into Calcite's Litmus.THROW, which raises AssertionError from raw Java code rather than via the `assert` keyword. JVM `-da` doesn't gate that path, so an assertion firing inside a search thread escapes to OpenSearchUncaughtExceptionHandler and exits the cluster JVM. This bit hard once `CalciteRelNodeVisitor` started lowering structured PPL `search` predicates to native filter shape: queries like `severityNumber="not-a-number"` fold to `=(SAFE_CAST($X), null)` ahead of the marking phase, and the Litmus check fires before any plan-executor listener gets to translate the error. The cluster died mid-IT and 21 subsequent tests failed with `Connection refused`. Convert AssertionError caught at the executor entrypoint to an IllegalStateException so the query reports as HTTP 500 with a bucketable message and the cluster survives. The same pattern is already in place at `UnifiedQueryPlanner.plan` on the SQL plugin side; this is the analytics-engine-side mirror so neither layer can produce a cluster-fatal assertion. Signed-off-by: Kai Huang <ahkcs@amazon.com> * [analytics-backend-datafusion] Preserve fractional seconds in DatetimeOutputCastRewriter format Widen the format string passed to `to_char` from the seconds-only {@code "%Y-%m-%d %H:%M:%S"} to {@code "%Y-%m-%d %H:%M:%S%.f"}. The trailing {@code %.f} is chrono's variable-length fractional-second specifier — a leading dot followed by 0-9 digits, omitted when the value has no sub-second precision. This matches PPL's legacy formatting for {@code date} and {@code date_nanos} fields where the displayed precision tracks the source value: - {@code 2024-01-15T10:30:01.23456789Z} (date_nanos) → {@code "2024-01-15 10:30:01.23456789"} (legacy) / now {@code "2024-01-15 10:30:01.234567890"} (analytics route — internally 9-digit, leading 0 because Arrow Timestamp(ns) precision is fixed) - {@code 2025-08-01T03:47:41Z} (date) → {@code "2025-08-01 03:47:41"} (both paths) — no decimal because the source value has no fractional component Surfaced by `CalciteSearchCommandIT.testSearchWithDateRangeComparisons`. Follow-up to opensearch-project/sql#5420 which the original PR (opensearch-project#21650) closed with a seconds-only format that dropped the fractional digits the tests still expect. Signed-off-by: Kai Huang <ahkcs@amazon.com> * [analytics-backend-datafusion] Apply schema coercion on indexed-executor placeholder The indexed-execution path inferred the parquet schema via build_segments (which calls FileFormat::infer_schema) and registered it directly on the PlaceholderProvider that from_substrait_plan binds against. The non-indexed paths (session_context.rs, query_executor.rs, api.rs) all routed their inferred schemas through schema_coerce::coerce_inferred_schema before registering, but the indexed path skipped this step. Result: an OpenSearch `ip` column lands as parquet `BinaryView` on disk; isthmus on the Java side serializes the Substrait base schema as plain `Binary` (Substrait has no view types). The placeholder reports `BinaryView` while the plan declares `Binary` — DataFusion's substrait consumer rejects the bind with: Substrait error: Field '<x>' in Substrait schema has a different type (Binary) than the corresponding field in the table schema (BinaryView). Every analytics-engine query against an index that includes an `ip` column (every OTEL-logs query in CalciteSearchCommandIT, for example) fails at fragment start. Apply coerce_inferred_schema right after build_segments, before PlaceholderProvider construction. The placeholder, the expr_to_bool_tree analysis, and the downstream IndexedTableProvider all see the same Substrait-compatible (Binary / Int64 / Float32) schema, so the bind succeeds and the parquet reader's SchemaAdapter handles the per-batch BinaryView→Binary relabeling at scan time. This restores parity with the other infer_schema sites; no behavior change for non-IP columns. Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com>

ahkcs requested a review from a team as a code owner May 15, 2026 18:37

ahkcs changed the title ~~[analytics-backend-datafusion] Declare scan support for IP / BINARY / MATCH_ONLY_TEXT~~ [analytics-engine] Coverage fixes for search command on the analytics-engine route May 15, 2026

ahkcs mentioned this pull request May 15, 2026

Correctness fixes for analytics-engine search route (datetime schema + IP/BINARY byte[]) opensearch-project/sql#5447

Open

2 tasks

ahkcs force-pushed the pr/datafusion-extra-field-types branch 2 times, most recently from 7258541 to 64127e6 Compare May 15, 2026 22:37

mengweieric mentioned this pull request May 17, 2026

[analytics-engine] Skip unsupported field types when building Calcite schema #21698

Merged

6 tasks