Skip to content

[analytics-engine] Skip unsupported field types when building Calcite schema#21698

Merged
mch2 merged 3 commits into
opensearch-project:mainfrom
mengweieric:ae-schema-degradation
May 18, 2026
Merged

[analytics-engine] Skip unsupported field types when building Calcite schema#21698
mch2 merged 3 commits into
opensearch-project:mainfrom
mengweieric:ae-schema-degradation

Conversation

@mengweieric
Copy link
Copy Markdown
Contributor

@mengweieric mengweieric commented May 17, 2026

Problem

Queries against any index that contains a field type the analytics-engine cannot scan (e.g. geo_point, geo_shape, alias) fail at planning time with IllegalArgumentException: Unsupported OpenSearch field type: <type> — even when the query never references the unsupported column.

Solution

Drop the unsupported column from the Calcite row type during schema build instead of throwing.

  • Queries that reference the dropped column → clean column not found validator error.
  • Queries that don't reference it → plan and execute normally.

Verification

Unit tests (16/16 green, 9 new): cover Calcite end-to-end validator behavior on dropped columns, mixed mappings, nested objects with unsupported leaves, all-unsupported indices, and the full type catalogue.

Same-cluster A/B IT regression sweep on a force-routed analytics-engine cluster, identical 4872-test run with fix OFF vs fix ON:

fix-OFF (baseline) fix-ON Δ
Pass 1401 1405 +4
Fail 3266 3262 −4
Skip 205 205 0
Unique tests hitting Unsupported field type: geo_point 214 (across 5 classes) 0 −214
Cluster-log Unsupported OpenSearch field type throws (raw, JVM-level) 443 0 −443
Regressions on previously-passing tests 0 classes

The 214 unblocked tests span CalciteDataTypeIT, CalciteDateTimeComparisonIT, CalciteGeoPointFormatsIT, CalciteMultiValueStatsIT, CalcitePPLCastFunctionIT. The pass-delta is small (+4) because most unblocked tests immediately re-fail on the next engine gap (TIMESTAMP / IS_NULL / IS_NOT_NULL scalar functions, etc.) — this fix is the necessary prerequisite for those tests to ever reach those gaps in the first place. The load-bearing metric is the 443 → 0 / 214 → 0 elimination, not the pass-rate %.

Per-class pass wins: CalciteGeoPointFormatsIT 0 → 2, CalcitePPLStringBuiltinFunctionIT 44 → 46.

Test plan

  • :sandbox:libs:analytics-api:check
  • :sandbox:plugins:analytics-engine:check
  • spotlessCheck on both modules
  • Unit tests for all field-type scenarios (supported, unsupported, nested, mixed, malformed)
  • Calcite validator behavior tested end-to-end via Frameworks.getPlanner
  • Same-cluster A/B IT regression sweep — error fully eliminated, 0 regressions

… schema

OpenSearchSchemaBuilder.buildSchema previously aborted with
IllegalArgumentException whenever an index mapping contained a field
type the analytics-engine cannot scan (e.g. geo_point, geo_shape).
This blocked any query against such an index even when the query did
not reference the unsupported column.

Drop the unsupported column from the Calcite row type instead. Queries
that reference the dropped column now fail at validation with a clean
"column not found" error; queries that do not reference it plan and
execute normally.

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric requested a review from a team as a code owner May 17, 2026 01:42
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 17, 2026

PR Reviewer Guide 🔍

(Review updated until commit 2cfa721)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 17, 2026

PR Code Suggestions ✨

Latest suggestions up to 2cfa721

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Use try-with-resources for Planner

The planner.close() in the finally block may suppress exceptions thrown during
parsing/validation, making test failures harder to diagnose. Consider using
try-with-resources if Planner implements AutoCloseable, or ensure exceptions from
the try block are not masked.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/analytics/engine/OpenSearchSchemaBuilderTests.java [402-416]

 private static RelNode parseValidateConvert(SchemaPlus schema, String sql) throws Exception {
     FrameworkConfig config = Frameworks.newConfigBuilder()
         .parserConfig(SqlParser.config().withCaseSensitive(false))
         .defaultSchema(schema)
         .operatorTable(SqlStdOperatorTable.instance())
         .build();
-    Planner planner = Frameworks.getPlanner(config);
-    try {
+    try (Planner planner = Frameworks.getPlanner(config)) {
         SqlNode parsed = planner.parse(sql);
         SqlNode validated = planner.validate(parsed);
         return planner.rel(validated).project();
-    } finally {
-        planner.close();
     }
 }
Suggestion importance[1-10]: 7

__

Why: Using try-with-resources ensures Planner is properly closed and prevents exception suppression. This is a best practice for resource management and improves code robustness, assuming Planner implements AutoCloseable.

Medium
Use specific exception type

Catching the generic Exception class makes it harder to verify the specific failure
mode. Consider catching a more specific Calcite validation exception type to ensure
the test fails for the right reason and doesn't accidentally pass on unrelated
exceptions.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/analytics/engine/OpenSearchSchemaBuilderTests.java [287]

-Exception ex = expectThrows(Exception.class, () -> parseValidateConvert(schema, "SELECT loc FROM kai_repro"));
+org.apache.calcite.tools.ValidationException ex = expectThrows(
+    org.apache.calcite.tools.ValidationException.class, 
+    () -> parseValidateConvert(schema, "SELECT loc FROM kai_repro")
+);
Suggestion importance[1-10]: 6

__

Why: Catching a more specific exception type like ValidationException would make the test more precise and ensure it fails for the correct reason. This improves test reliability and clarity, though the current approach with message validation still works.

Low
Reuse JavaTypeFactoryImpl instance

Creating a new JavaTypeFactoryImpl instance for each test invocation may lead to
inconsistent type system behavior. Consider reusing a single instance across test
methods or using a factory method to ensure consistent type resolution throughout
the test suite.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/analytics/engine/OpenSearchSchemaBuilderTests.java [209]

-RelDataType rowType = table.getRowType(new org.apache.calcite.jdbc.JavaTypeFactoryImpl());
+private static final org.apache.calcite.jdbc.JavaTypeFactoryImpl TYPE_FACTORY = new org.apache.calcite.jdbc.JavaTypeFactoryImpl();
 
+RelDataType rowType = table.getRowType(TYPE_FACTORY);
+
Suggestion importance[1-10]: 4

__

Why: While reusing a JavaTypeFactoryImpl instance could improve consistency and performance, the current approach of creating new instances per test is acceptable for test isolation. The suggestion is valid but offers marginal benefit in a test context.

Low

Previous suggestions

Suggestions up to commit e6dae75
CategorySuggestion                                                                                                                                    Impact
General
Add logging for dropped fields

Add logging when dropping unsupported field types to aid debugging and monitoring.
This helps administrators understand which fields are being silently excluded from
the schema, especially for unknown plugin types that might be legitimate but
unsupported.

sandbox/libs/analytics-api/src/main/java/org/opensearch/analytics/schema/OpenSearchSchemaBuilder.java [173-179]

 SqlTypeName sqlType = mapFieldType(fieldType);
 if (sqlType == null) {
     // Unsupported (geo_point/shape/completion/…) or unknown plugin type. Drop the
     // column; a query referencing it surfaces a Calcite "column not found" via the
     // validator rather than a planning-time IllegalArgumentException.
+    logger.debug("Dropping unsupported field '{}' of type '{}' from schema", fieldName, fieldType);
     continue;
 }
Suggestion importance[1-10]: 5

__

Why: Adding debug logging when dropping unsupported fields would help with troubleshooting and monitoring, especially for unknown plugin types. However, the suggestion doesn't verify that a logger instance exists in the class, and the impact is moderate since this is debug-level logging that aids observability rather than fixing a bug.

Low
Suggestions up to commit 60a5386
CategorySuggestion                                                                                                                                    Impact
General
Use try-with-resources for planner cleanup

The planner.close() in the finally block may suppress exceptions thrown during
parse/validate/convert operations. If an exception occurs before reaching the
finally block, the planner still closes, but if close() itself throws, it will mask
the original exception. Consider using try-with-resources for automatic resource
management.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/analytics/engine/OpenSearchSchemaBuilderTests.java [402-416]

 private static RelNode parseValidateConvert(SchemaPlus schema, String sql) throws Exception {
     FrameworkConfig config = Frameworks.newConfigBuilder()
         .parserConfig(SqlParser.config().withCaseSensitive(false))
         .defaultSchema(schema)
         .operatorTable(SqlStdOperatorTable.instance())
         .build();
-    Planner planner = Frameworks.getPlanner(config);
-    try {
+    try (Planner planner = Frameworks.getPlanner(config)) {
         SqlNode parsed = planner.parse(sql);
         SqlNode validated = planner.validate(parsed);
         return planner.rel(validated).project();
-    } finally {
-        planner.close();
     }
 }
Suggestion importance[1-10]: 7

__

Why: Using try-with-resources is a best practice for resource management and prevents potential exception suppression. This improves code robustness and readability, making it a valuable enhancement to the helper method.

Medium
Catch specific validation exception type

Catching the generic Exception class makes it harder to verify the specific failure
mode. The test should catch a more specific Calcite validation exception type to
ensure the failure occurs at the expected validation stage rather than during
parsing or conversion.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/analytics/engine/OpenSearchSchemaBuilderTests.java [287]

-Exception ex = expectThrows(Exception.class, () -> parseValidateConvert(schema, "SELECT loc FROM kai_repro"));
+org.apache.calcite.tools.ValidationException ex = expectThrows(
+    org.apache.calcite.tools.ValidationException.class, 
+    () -> parseValidateConvert(schema, "SELECT loc FROM kai_repro")
+);
Suggestion importance[1-10]: 6

__

Why: Catching a more specific exception type (ValidationException) would make the test more precise and better document the expected failure mode. However, the test already validates the error message content, so the improvement is moderate.

Low
Reuse shared type factory instance

Creating a new JavaTypeFactoryImpl instance for each test invocation may lead to
inconsistent type system behavior. Consider extracting a shared RelDataTypeFactory
instance as a test fixture to ensure consistent type resolution across all test
methods.

sandbox/plugins/analytics-engine/src/test/java/org/opensearch/analytics/engine/OpenSearchSchemaBuilderTests.java [209]

-RelDataType rowType = table.getRowType(new org.apache.calcite.jdbc.JavaTypeFactoryImpl());
+private static final RelDataTypeFactory TYPE_FACTORY = new org.apache.calcite.jdbc.JavaTypeFactoryImpl();
 
+RelDataType rowType = table.getRowType(TYPE_FACTORY);
+
Suggestion importance[1-10]: 4

__

Why: While reusing a shared TYPE_FACTORY instance could improve consistency, creating new instances per test is a common pattern that ensures test isolation. The suggestion is valid but offers only marginal improvement in this context.

Low

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 60a5386: SUCCESS

@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.44%. Comparing base (a55b56a) to head (2cfa721).

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21698      +/-   ##
============================================
+ Coverage     73.36%   73.44%   +0.07%     
- Complexity    74672    74753      +81     
============================================
  Files          5991     5991              
  Lines        339374   339374              
  Branches      48921    48921              
============================================
+ Hits         248980   249247     +267     
+ Misses        70522    70295     -227     
+ Partials      19872    19832      -40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e6dae75

@mengweieric
Copy link
Copy Markdown
Contributor Author

The sandbox-check failure is in CoordinatorResilienceIT.testShardHostNodeKillAndRestart and
testCoordinatorSurvivesIntermittentNetwork — distributed-quorum resilience tests, with no overlap with
OpenSearchSchemaBuilder.java. Same branch passed sandbox-check at 01:42 UTC (run 25978261158); concurrent unrelated
PRs (#21691, #21681, #21680) also show sandbox-check fail — known flaky pattern.

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric force-pushed the ae-schema-degradation branch from 8905bb3 to 2cfa721 Compare May 17, 2026 03:52
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2cfa721

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2cfa721

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 2cfa721: SUCCESS

@mch2 mch2 merged commit 2f2c1f4 into opensearch-project:main May 18, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants