Add auto-extract mode for `spath` command by dai-chen · Pull Request #5140 · opensearch-project/sql

dai-chen · 2026-02-12T16:44:29Z

Description

As a follow-up after #5139, this PR implements the auto-extract mode for the spath command per the proposal in #4307. When path parameter is omitted, spath rewrites to json_extract_all(input) function and returns a map<string, string> with flattened keys instead of expanding dynamic columns.

Minor differences from proposal:

Malformed JSON returns partial results

Related Issues

Resolves #4307 (partially, ergonomic map access syntax are follow-ups)

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>

…serve issue Signed-off-by: Chen Dai <daichen@amazon.com>

coderabbitai · 2026-02-12T16:44:39Z

📝 Walkthrough

Walkthrough

Made SPath.path optional to enable an auto-extract mode; spath without a path now rewrites to json_extract_all and returns a flattened Map<String,String>. Changes touch core AST, json UDF behavior, parser/rewrite logic, tests, test data/mappings, and documentation.

Changes

Cohort / File(s)	Summary
Core AST `core/src/main/java/org/opensearch/sql/ast/tree/SPath.java`	Made `path` nullable and added conditional rewrite logic in `rewriteAsEval` to route pathless spath to auto-extract behavior.
JSON UDF Implementation & Tests `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImpl.java`, `core/src/test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImplTest.java`	Changed json_extract_all to produce stringified values (`Map<String,String>`), updated return type inference to VARCHAR, added `stringifyMap` helper, and adapted unit tests to expect string values.
PPL Parser & Rewrite Tests `ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`, `ppl/src/test/java/org/opensearch/sql/ppl/utils/SPathRewriteTest.java`	Removed runtime validation that required a path; allow constructing `SPath` with null path and added tests verifying auto-extract rewrite (with/without output).
PPL Parser Tests / Anonymizer `ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java`, `ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java`	Added tests covering spath with no path and anonymization of pathless spath queries.
Calcite PPL Tests `ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLSpathTest.java`	Reworked tests to fluent API and added both path-mode and auto-extract-mode tests (with output variants).
Integration Tests `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java`, `integ-test/src/test/java/org/opensearch/sql/calcite/standalone/JsonExtractAllFunctionIT.java`	Added integration tests exercising auto-extract scenarios (nested fields, arrays, duplicate keys, stringify/null handling) and adapted helpers to `Map<String,String>`.
Docs & Test Data / Mappings `docs/user/ppl/cmd/spath.md`, `doctest/test_data/structured.json`, `doctest/test_mapping/structured.json`	Docs updated to describe dual modes (path-based and auto-extract) with flattening rules and examples; test data and mapping add `doc_auto` entries to exercise auto-extract.

Sequence Diagram

sequenceDiagram
    participant User
    participant Parser as PPL Parser
    participant SPath as SPath AST
    participant Rewriter as SPath Rewriter
    participant JsonExtract as JsonExtractAll
    participant Output as Result Map

    User->>Parser: spath input=field (no path)
    Parser->>SPath: Create SPath(input=field, path=null)
    SPath->>Rewriter: rewriteAsEval(context)
    Rewriter->>Rewriter: Detect path is null
    Rewriter->>JsonExtract: Use json_extract_all(field)
    JsonExtract->>JsonExtract: Parse JSON and flatten structure
    JsonExtract->>JsonExtract: Stringify all values
    JsonExtract->>Output: Return Map<String,String>
    Output->>User: Flattened key-value pairs (strings)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Support spath with dynamic fields #5058 — similar SPath nullable/pathless handling and json_extract_all auto-extract work touching the same spath/json-extraction flow.
Implement spath command with field resolution #5028 — related modifications routing null path to json_extract_all and adjusting JsonExtractAll behavior.

Suggested labels

calcite

Suggested reviewers

ykmr1224
penghuo
ps48
kavithacm
derek-ho
joshuali925
GumpacG
Swiddis
anirudha
forestmvey

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 3.70% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection	⚠️ Warning	❌ Merge conflicts detected (16 files): ⚔️ `build.gradle` (content) ⚔️ `core/src/main/java/org/opensearch/sql/ast/tree/SPath.java` (content) ⚔️ `core/src/main/java/org/opensearch/sql/executor/QueryService.java` (content) ⚔️ `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImpl.java` (content) ⚔️ `core/src/test/java/org/opensearch/sql/executor/QueryServiceTest.java` (content) ⚔️ `core/src/test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImplTest.java` (content) ⚔️ `docs/user/ppl/cmd/spath.md` (content) ⚔️ `doctest/test_data/structured.json` (content) ⚔️ `doctest/test_mapping/structured.json` (content) ⚔️ `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java` (content) ⚔️ `integ-test/src/test/java/org/opensearch/sql/calcite/standalone/JsonExtractAllFunctionIT.java` (content) ⚔️ `ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java` (content) ⚔️ `ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLSpathTest.java` (content) ⚔️ `ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java` (content) ⚔️ `ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java` (content) ⚔️ `ppl/src/test/java/org/opensearch/sql/ppl/utils/SPathRewriteTest.java` (content) These conflicts must be resolved before merging into `main`.	Resolve conflicts locally and push changes to this branch.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: adding auto-extract mode for the spath command, which aligns with the core functionality implemented across the changeset.
Linked Issues check	✅ Passed	The PR implements the dynamic columns support objective from issue `#4307` by enabling spath to extract all JSON attributes as a flattened map<string, string> when path is omitted, partially resolving the issue as stated.
Out of Scope Changes check	✅ Passed	All code changes are scoped to implementing the auto-extract mode for spath: modifying SPath to allow nullable path, updating JsonExtractAllFunctionImpl for string-value mapping, adding comprehensive tests, and updating documentation with new auto-extract mode details.
Description check	✅ Passed	The PR description clearly relates to the changeset, describing the auto-extract mode implementation for spath command when path parameter is omitted.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

⚔️ Resolve merge conflicts (beta)

Auto-commit resolved conflicts to branch add-auto-extract-mode-support-in-spath
Post resolved changes as copyable diffs in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Chen Dai <daichen@amazon.com>

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@docs/user/ppl/cmd/spath.md`:
- Around line 41-43: There are two adjacent blockquotes ("Auto-extract mode
processes the entire input field..." and "Invalid or malformed JSON returns
partial results...") separated by a blank line which triggers MD028; remove the
blank line and either merge the two notes into a single blockquote or place
regular text between them so they are no longer consecutive blockquotes—update
the lines containing those two blockquote texts to be joined into one blockquote
or separated by non-blockquote content.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java`:
- Around line 40-48: Add three new test documents to the auto-extract fixtures
for CalcitePPLSpathCommandIT by creating additional Request objects (similar to
autoExtractDoc) and performing them via client().performRequest: (1) a document
where the target auto-extract field(s) are explicitly null to exercise null
handling, (2) a document where those fields contain empty JSON object "{}" and
an empty JSON array "[]" to cover boundary/empty-input behavior, and (3) a
document with a deliberately malformed JSON string (e.g., truncated or missing
quotes/braces) to trigger error paths; each should use the same PUT to the
/test_spath_auto/_doc/{id}?refresh=true pattern with unique ids and matching
JSON entity strings so tests validate null, empty, and invalid inputs for the
auto-extract mode.
- Around line 59-63: The testSpathAutoExtract test only asserts schema via
verifySchema(schema("doc","struct")) which is too weak; update
testSpathAutoExtract to also assert actual extracted values and multi-document
behavior by parsing the JSONObject returned from executeQuery("source=test_spath
| spath input=doc") and adding row-level assertions that specific keys/values
exist for each document (e.g., expected map entries inside the "doc" struct),
and/or use existing row-assert helper methods to assert the expected number of
rows and exact field values; keep the initial verifySchema call but add concrete
assertions that validate the map extraction across multiple documents.

docs/user/ppl/cmd/spath.md

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java

Signed-off-by: Chen Dai <daichen@amazon.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java`:
- Around line 102-111: Add a boundary test that verifies empty-array handling
for the spath array suffix case: create a new test (or extend
testSpathAutoExtractArraySuffix) that runs executeQuery with a document whose
array field is empty (e.g., "tags": []) and assert schema via
verifySchema("result","struct") and data via verifyDataRows expecting the
extracted key "tags{}" to stringify as "[]"; reference the existing test method
testSpathAutoExtractArraySuffix, the helpers executeQuery, verifySchema and
verifyDataRows, and ensure the expected JSONObject uses "tags{}":"[]".

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java

Signed-off-by: Chen Dai <daichen@amazon.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@docs/user/ppl/cmd/spath.md`:
- Around line 161-163: The example table in spath.md has an inconsistent
representation for single-element arrays: update the second example row so the
"tags{}" output is "[python]" (i.e., change 'tags{}': 'python' to 'tags{}':
'[python]') to match the bracketed array stringification used in the other rows
and documented rules; locate the example table in spath.md and adjust that cell
text accordingly.

docs/user/ppl/cmd/spath.md

Swiddis

I like this approach! lgtm with comments

docs/user/ppl/cmd/spath.md

...src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImpl.java

Swiddis · 2026-02-13T19:33:14Z

...test/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImplTest.java

+            Map.of(
+                "data.items{}{}", "[1, 2]",
+                "data.items{}{}.hello", "3",
+                "data.items{}", "4",


thought: Is this bracket syntax easily supported by downstream commands that want to extract these fields, or will there need to be odd escaping?

That might only be a concern for the mapping functions in a future PR, but I would still carefully review if this is the extraction syntax we want for arrays. At a glance this isn't intuitive to me but after squinting at the docs for a bit I kinda got it?

e.g. It's not obvious to me what will happen if I try "items": [[1, 2], [3, 4]], do I get two items{}{} keys? Does one overwrite the other? Are they merged into one array? (From code review I know which one it is, but not as a user given only the docs & this example)

Yes, I think we don't have good workaround until we move to Map<String,Any> or schemaless. Currently it just preserves all the values from "conflicting" key. Let me verify your example.

The result is items{}{} as expected. I can add more cases in doctest or this UT if that's your suggestion. Thanks!

dai-chen added 2 commits February 11, 2026 16:03

Add auto extraction mode in spath command

f6ccc31

Signed-off-by: Chen Dai <daichen@amazon.com>

Change json_extract_all to return map<string,string> and fix null per…

d4e15c0

…serve issue Signed-off-by: Chen Dai <daichen@amazon.com>

dai-chen self-assigned this Feb 12, 2026

dai-chen added enhancement New feature or request PPL Piped processing language labels Feb 12, 2026

dai-chen added 3 commits February 12, 2026 10:28

Refactor all unit test and integration tests

8a05c8b

Signed-off-by: Chen Dai <daichen@amazon.com>

Refactor json_extract_all and fix stringify issue

51a395c

Signed-off-by: Chen Dai <daichen@amazon.com>

Fix broken IT and doctest

2a45588

Signed-off-by: Chen Dai <daichen@amazon.com>

dai-chen marked this pull request as ready for review February 13, 2026 01:51

dai-chen requested review from GumpacG, LantaoJin, MaxKsyunz, Swiddis, YANG-DB, Yury-Fridlyand, acarbonetto, anirudha, derek-ho, forestmvey, joshuali925, kavithacm, mengweieric, noCharger, penghuo, ps48, qianheng-aws, seankao-az, vamsimanohar and ykmr1224 as code owners February 13, 2026 01:51

dai-chen requested review from RyanL1997 and yuancu as code owners February 13, 2026 01:51

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

penghuo previously approved these changes Feb 13, 2026

View reviewed changes

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java Outdated Show resolved Hide resolved

Address PR comments

4890a4e

Signed-off-by: Chen Dai <daichen@amazon.com>

dai-chen dismissed penghuo’s stale review via 4890a4e February 13, 2026 17:18

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java Show resolved Hide resolved

Mark auto extract mode as experimental

191c21f

Signed-off-by: Chen Dai <daichen@amazon.com>

dai-chen requested a review from penghuo February 13, 2026 17:50

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

docs/user/ppl/cmd/spath.md Show resolved Hide resolved

Swiddis approved these changes Feb 13, 2026

View reviewed changes

penghuo approved these changes Feb 13, 2026

View reviewed changes

penghuo merged commit be44a8e into opensearch-project:main Feb 13, 2026
50 of 51 checks passed

Conversation

dai-chen commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Swiddis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Swiddis Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dai-chen Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

dai-chen Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dai-chen commented Feb 12, 2026 •

edited

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Swiddis Feb 13, 2026 •

edited

Loading

dai-chen Feb 13, 2026 •

edited

Loading