Skip to content

Commit 92e73ea

Browse files
Feature/mvcombine (opensearch-project#5025)
* MvCombine Command Feature Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * MvCombine Command Feature Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Add doctests to MvCombine Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * spotlesscheck apply Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * spotlesscheck apply Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * spotlesscheck apply Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * spotlessapply Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address coderrabbit comments Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address coderrabbit comments Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address coderrabbit comments Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address coderrabbit comments Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address coderrabbit comments Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address coderrabbit comments Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Add mvcombine to index.md Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Remove the nomv related implementation as that command is still not yet implemented Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Remove the nomv related implementation as that command is still not yet implemented Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Remove the nomv related implementation as that command is still not yet implemented Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Remove the nomv related implementation as that command is still not yet implemented Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * complete the checklist from ppl-commands.md Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * spotlessApply Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Add visitMvCombine method to the FieldResolutionVisitor Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Apply spotlesscheck Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Add changes to exclude the metadata fields and remove the CAST logic Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address CrossClusterSearchIT comment Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address CrossClusterSearchIT comment Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address CrossClusterSearchIT comment Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Coderrabbit issues Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Coderrabbit issues Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Coderrabbit issues Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Coderrabbit issues Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address comments Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address comments Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> --------- Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> Signed-off-by: Srikanth Padakanti <srikanth29.9@gmail.com> Co-authored-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
1 parent 7630db8 commit 92e73ea

31 files changed

Lines changed: 1143 additions & 6 deletions

core/src/main/java/org/opensearch/sql/analysis/Analyzer.java

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@
8080
import org.opensearch.sql.ast.tree.Lookup;
8181
import org.opensearch.sql.ast.tree.ML;
8282
import org.opensearch.sql.ast.tree.Multisearch;
83+
import org.opensearch.sql.ast.tree.MvCombine;
8384
import org.opensearch.sql.ast.tree.Paginate;
8485
import org.opensearch.sql.ast.tree.Parse;
8586
import org.opensearch.sql.ast.tree.Patterns;
@@ -535,6 +536,11 @@ public LogicalPlan visitAddColTotals(AddColTotals node, AnalysisContext context)
535536
throw getOnlyForCalciteException("addcoltotals");
536537
}
537538

539+
@Override
540+
public LogicalPlan visitMvCombine(MvCombine node, AnalysisContext context) {
541+
throw getOnlyForCalciteException("mvcombine");
542+
}
543+
538544
/** Build {@link ParseExpression} to context and skip to child nodes. */
539545
@Override
540546
public LogicalPlan visitParse(Parse node, AnalysisContext context) {

core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@
6868
import org.opensearch.sql.ast.tree.Lookup;
6969
import org.opensearch.sql.ast.tree.ML;
7070
import org.opensearch.sql.ast.tree.Multisearch;
71+
import org.opensearch.sql.ast.tree.MvCombine;
7172
import org.opensearch.sql.ast.tree.Paginate;
7273
import org.opensearch.sql.ast.tree.Parse;
7374
import org.opensearch.sql.ast.tree.Patterns;
@@ -466,4 +467,8 @@ public T visitAddTotals(AddTotals node, C context) {
466467
public T visitAddColTotals(AddColTotals node, C context) {
467468
return visitChildren(node, context);
468469
}
470+
471+
public T visitMvCombine(MvCombine node, C context) {
472+
return visitChildren(node, context);
473+
}
469474
}

core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
import org.opensearch.sql.ast.tree.Join;
4545
import org.opensearch.sql.ast.tree.Lookup;
4646
import org.opensearch.sql.ast.tree.Multisearch;
47+
import org.opensearch.sql.ast.tree.MvCombine;
4748
import org.opensearch.sql.ast.tree.Parse;
4849
import org.opensearch.sql.ast.tree.Patterns;
4950
import org.opensearch.sql.ast.tree.Project;
@@ -634,6 +635,22 @@ public Node visitExpand(Expand node, FieldResolutionContext context) {
634635
return node;
635636
}
636637

638+
@Override
639+
public Node visitMvCombine(MvCombine node, FieldResolutionContext context) {
640+
Set<String> mvCombineFields = extractFieldsFromExpression(node.getField());
641+
642+
FieldResolutionResult current = context.getCurrentRequirements();
643+
644+
Set<String> regularFields = new HashSet<>(current.getRegularFields());
645+
regularFields.addAll(mvCombineFields);
646+
647+
context.pushRequirements(new FieldResolutionResult(regularFields, Set.of(ALL_FIELDS)));
648+
649+
visitChildren(node, context);
650+
context.popRequirements();
651+
return node;
652+
}
653+
637654
private Set<String> extractFieldsFromAggregation(UnresolvedExpression expr) {
638655
Set<String> fields = new HashSet<>();
639656
if (expr instanceof Alias alias) {

core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@
6262
import org.opensearch.sql.ast.tree.Head;
6363
import org.opensearch.sql.ast.tree.Limit;
6464
import org.opensearch.sql.ast.tree.MinSpanBin;
65+
import org.opensearch.sql.ast.tree.MvCombine;
6566
import org.opensearch.sql.ast.tree.Parse;
6667
import org.opensearch.sql.ast.tree.Patterns;
6768
import org.opensearch.sql.ast.tree.Project;
@@ -468,6 +469,14 @@ public static List<Argument> defaultDedupArgs() {
468469
argument("consecutive", booleanLiteral(false)));
469470
}
470471

472+
public static MvCombine mvcombine(Field field) {
473+
return new MvCombine(field, null);
474+
}
475+
476+
public static MvCombine mvcombine(Field field, String delim) {
477+
return new MvCombine(field, delim);
478+
}
479+
471480
public static List<Argument> sortOptions() {
472481
return exprList(argument("desc", booleanLiteral(false)));
473482
}
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
/*
2+
* Copyright OpenSearch Contributors
3+
* SPDX-License-Identifier: Apache-2.0
4+
*/
5+
6+
package org.opensearch.sql.ast.tree;
7+
8+
import com.google.common.collect.ImmutableList;
9+
import java.util.List;
10+
import javax.annotation.Nullable;
11+
import lombok.EqualsAndHashCode;
12+
import lombok.Getter;
13+
import lombok.ToString;
14+
import org.opensearch.sql.ast.AbstractNodeVisitor;
15+
import org.opensearch.sql.ast.expression.Field;
16+
17+
@Getter
18+
@ToString(callSuper = true)
19+
@EqualsAndHashCode(callSuper = false)
20+
public class MvCombine extends UnresolvedPlan {
21+
22+
private final Field field;
23+
private final String delim;
24+
@Nullable private UnresolvedPlan child;
25+
26+
public MvCombine(Field field, @Nullable String delim) {
27+
this.field = field;
28+
this.delim = (delim == null) ? " " : delim;
29+
}
30+
31+
public MvCombine attach(UnresolvedPlan child) {
32+
this.child = child;
33+
return this;
34+
}
35+
36+
@Override
37+
public List<UnresolvedPlan> getChild() {
38+
return child == null ? ImmutableList.of() : ImmutableList.of(child);
39+
}
40+
41+
@Override
42+
public <T, C> T accept(AbstractNodeVisitor<T, C> nodeVisitor, C context) {
43+
return nodeVisitor.visitMvCombine(this, context);
44+
}
45+
}

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@
6363
import org.apache.calcite.rex.RexVisitorImpl;
6464
import org.apache.calcite.rex.RexWindowBounds;
6565
import org.apache.calcite.sql.SqlKind;
66+
import org.apache.calcite.sql.fun.SqlLibraryOperators;
6667
import org.apache.calcite.sql.fun.SqlStdOperatorTable;
6768
import org.apache.calcite.sql.fun.SqlTrimFunction;
6869
import org.apache.calcite.sql.type.ArraySqlType;
@@ -124,6 +125,7 @@
124125
import org.opensearch.sql.ast.tree.Lookup.OutputStrategy;
125126
import org.opensearch.sql.ast.tree.ML;
126127
import org.opensearch.sql.ast.tree.Multisearch;
128+
import org.opensearch.sql.ast.tree.MvCombine;
127129
import org.opensearch.sql.ast.tree.Paginate;
128130
import org.opensearch.sql.ast.tree.Parse;
129131
import org.opensearch.sql.ast.tree.Patterns;
@@ -3169,6 +3171,174 @@ public RelNode visitExpand(Expand expand, CalcitePlanContext context) {
31693171
return context.relBuilder.peek();
31703172
}
31713173

3174+
/**
3175+
* mvcombine command visitor to collapse rows that are identical on all non-target fields.
3176+
*
3177+
* <p>Grouping semantics:
3178+
*
3179+
* <ul>
3180+
* <li>The target field is always excluded from the GROUP BY keys.
3181+
* <li>Metadata fields (for example {@code _id}, {@code _index}, {@code _score}) are excluded
3182+
* from GROUP BY keys <strong>unless</strong> they were explicitly projected earlier (for
3183+
* example, via {@code fields}).
3184+
* </ul>
3185+
*
3186+
* <p>The target field values are aggregated using {@code ARRAY_AGG}, with {@code NULL} values
3187+
* filtered out. The aggregation result replaces the original target column and produces an {@code
3188+
* ARRAY<T>} output.
3189+
*
3190+
* <p>The original output column order is preserved. Metadata fields are projected as typed {@code
3191+
* NULL} literals after aggregation only when they are not part of grouping (since they were
3192+
* skipped).
3193+
*
3194+
* @param node mvcombine command to be visited
3195+
* @param context CalcitePlanContext containing the RelBuilder and planning context
3196+
* @return RelNode representing collapsed records with the target combined into a multivalue array
3197+
* @throws SemanticCheckException if the mvcombine target is not a direct field reference
3198+
*/
3199+
@Override
3200+
public RelNode visitMvCombine(MvCombine node, CalcitePlanContext context) {
3201+
// 1) Lower the child plan first so the RelBuilder has the input schema on the stack.
3202+
visitChildren(node, context);
3203+
3204+
final RelBuilder relBuilder = context.relBuilder;
3205+
3206+
final RelNode input = relBuilder.peek();
3207+
final List<String> inputFieldNames = input.getRowType().getFieldNames();
3208+
final List<RelDataType> inputFieldTypes =
3209+
input.getRowType().getFieldList().stream().map(RelDataTypeField::getType).toList();
3210+
3211+
// If true, we should NOT auto-skip meta fields (because user explicitly projected them)
3212+
final boolean includeMetaFields = context.isProjectVisited();
3213+
3214+
// 2) Resolve the mvcombine target to an input column index (must be a direct field reference).
3215+
final Field targetField = node.getField();
3216+
final int targetIndex = resolveTargetIndex(targetField, context);
3217+
final String targetName = inputFieldNames.get(targetIndex);
3218+
3219+
// 3) Group by all non-target fields, skipping meta fields unless explicitly projected.
3220+
final List<RexNode> groupExprs =
3221+
buildGroupExpressionsExcludingTarget(
3222+
targetIndex, inputFieldNames, relBuilder, includeMetaFields);
3223+
3224+
// 4) Aggregate target values using ARRAY_AGG, filtering out NULLs.
3225+
performArrayAggAggregation(relBuilder, targetIndex, targetName, groupExprs);
3226+
3227+
// 5) Restore original output column order (ARRAY_AGG already returns ARRAY<T>).
3228+
restoreColumnOrderAfterArrayAgg(
3229+
relBuilder, inputFieldNames, inputFieldTypes, targetIndex, groupExprs, includeMetaFields);
3230+
3231+
return relBuilder.peek();
3232+
}
3233+
3234+
/** Resolves the mvcombine target expression to an input field index. */
3235+
private int resolveTargetIndex(Field targetField, CalcitePlanContext context) {
3236+
final RexNode targetRex;
3237+
try {
3238+
targetRex = rexVisitor.analyze(targetField, context);
3239+
} catch (IllegalArgumentException e) {
3240+
// Make missing-field behavior deterministic (and consistently mapped to 4xx)
3241+
// instead of leaking RelBuilder/rexVisitor exception wording.
3242+
throw new SemanticCheckException(
3243+
"mvcombine target field not found: " + targetField.getField().toString(), e);
3244+
}
3245+
3246+
if (!isInputRef(targetRex)) {
3247+
throw new SemanticCheckException(
3248+
"mvcombine target must be a direct field reference, but got: " + targetField);
3249+
}
3250+
3251+
final int index = ((RexInputRef) targetRex).getIndex();
3252+
3253+
final RelDataType fieldType =
3254+
context.relBuilder.peek().getRowType().getFieldList().get(index).getType();
3255+
3256+
if (SqlTypeUtil.isArray(fieldType) || SqlTypeUtil.isMultiset(fieldType)) {
3257+
throw new SemanticCheckException(
3258+
"mvcombine target cannot be an array/multivalue type, but got: " + fieldType);
3259+
}
3260+
3261+
return index;
3262+
}
3263+
3264+
/**
3265+
* Builds group-by expressions for mvcombine: all non-target input fields; meta fields are skipped
3266+
* unless includeMetaFields is true.
3267+
*/
3268+
private List<RexNode> buildGroupExpressionsExcludingTarget(
3269+
int targetIndex,
3270+
List<String> inputFieldNames,
3271+
RelBuilder relBuilder,
3272+
boolean includeMetaFields) {
3273+
3274+
final List<RexNode> groupExprs = new ArrayList<>(Math.max(0, inputFieldNames.size() - 1));
3275+
for (int i = 0; i < inputFieldNames.size(); i++) {
3276+
if (i == targetIndex) {
3277+
continue;
3278+
}
3279+
if (isMetadataField(inputFieldNames.get(i)) && !includeMetaFields) {
3280+
continue;
3281+
}
3282+
groupExprs.add(relBuilder.field(i));
3283+
}
3284+
return groupExprs;
3285+
}
3286+
3287+
/** Applies mvcombine aggregation. */
3288+
private void performArrayAggAggregation(
3289+
RelBuilder relBuilder, int targetIndex, String targetName, List<RexNode> groupExprs) {
3290+
3291+
final RexNode targetRef = relBuilder.field(targetIndex);
3292+
final RexNode notNullTarget = relBuilder.isNotNull(targetRef);
3293+
3294+
final RelBuilder.AggCall aggCall =
3295+
relBuilder
3296+
.aggregateCall(SqlLibraryOperators.ARRAY_AGG, targetRef)
3297+
.filter(notNullTarget)
3298+
.as(targetName);
3299+
3300+
relBuilder.aggregate(relBuilder.groupKey(groupExprs), aggCall);
3301+
}
3302+
3303+
/**
3304+
* Restores the original output column order after the aggregate step. Meta fields are set to
3305+
* typed NULL only when they were skipped from grouping (includeMetaFields=false).
3306+
*/
3307+
private void restoreColumnOrderAfterArrayAgg(
3308+
RelBuilder relBuilder,
3309+
List<String> inputFieldNames,
3310+
List<RelDataType> inputFieldTypes,
3311+
int targetIndex,
3312+
List<RexNode> groupExprs,
3313+
boolean includeMetaFields) {
3314+
3315+
final int aggregatedTargetPos = groupExprs.size();
3316+
3317+
final List<RexNode> projections = new ArrayList<>(inputFieldNames.size());
3318+
final List<String> projectionNames = new ArrayList<>(inputFieldNames.size());
3319+
3320+
int groupPos = 0;
3321+
for (int i = 0; i < inputFieldNames.size(); i++) {
3322+
final String fieldName = inputFieldNames.get(i);
3323+
projectionNames.add(fieldName);
3324+
3325+
if (i == targetIndex) {
3326+
// aggregated target is always the last field in the aggregate output
3327+
projections.add(relBuilder.field(aggregatedTargetPos));
3328+
} else if (isMetadataField(fieldName) && !includeMetaFields) {
3329+
// meta fields were skipped from grouping => not present in aggregate output => keep schema
3330+
// stable
3331+
projections.add(relBuilder.getRexBuilder().makeNullLiteral(inputFieldTypes.get(i)));
3332+
} else {
3333+
// grouped field (including meta fields when includeMetaFields=true)
3334+
projections.add(relBuilder.field(groupPos));
3335+
groupPos++;
3336+
}
3337+
}
3338+
3339+
relBuilder.project(projections, projectionNames, /* force= */ true);
3340+
}
3341+
31723342
@Override
31733343
public RelNode visitValues(Values values, CalcitePlanContext context) {
31743344
if (values.getValues() == null || values.getValues().isEmpty()) {

docs/category.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
"user/ppl/cmd/head.md",
2424
"user/ppl/cmd/join.md",
2525
"user/ppl/cmd/lookup.md",
26+
"user/ppl/cmd/mvcombine.md",
2627
"user/ppl/cmd/parse.md",
2728
"user/ppl/cmd/patterns.md",
2829
"user/ppl/cmd/rare.md",

docs/user/dql/metadata.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Example 1: Show All Indices Information
3535
SQL query::
3636

3737
os> SHOW TABLES LIKE '%'
38-
fetched rows / total rows = 23/23
38+
fetched rows / total rows = 24/24
3939
+----------------+-------------+-------------------+------------+---------+----------+------------+-----------+---------------------------+----------------+
4040
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION |
4141
|----------------+-------------+-------------------+------------+---------+----------+------------+-----------+---------------------------+----------------|
@@ -48,6 +48,7 @@ SQL query::
4848
| docTestCluster | null | events_many_hosts | BASE TABLE | null | null | null | null | null | null |
4949
| docTestCluster | null | events_null | BASE TABLE | null | null | null | null | null | null |
5050
| docTestCluster | null | json_test | BASE TABLE | null | null | null | null | null | null |
51+
| docTestCluster | null | mvcombine_data | BASE TABLE | null | null | null | null | null | null |
5152
| docTestCluster | null | nested | BASE TABLE | null | null | null | null | null | null |
5253
| docTestCluster | null | nyc_taxi | BASE TABLE | null | null | null | null | null | null |
5354
| docTestCluster | null | occupation | BASE TABLE | null | null | null | null | null | null |

0 commit comments

Comments
 (0)