Added UDT for IP and Binary support by vinaykpud · Pull Request #5463 · opensearch-project/sql

vinaykpud · 2026-05-22T17:00:14Z

Description

Companion change for opensearch-project/OpenSearch#21807 which adds Calcite UDTs (IpType, BinaryType) for OpenSearch ip and binary columns to the analytics-engine path. This PR teaches the SQL plugin to recognize the new UDTs at the response-schema boundary so:

byte[] cells from the analytics backend get rendered as canonical IP strings / base64 strings (was: unsupported object class [B).
The response schema reports "type": "ip" / "type": "binary" correctly (was: both labelled "binary").
Existing comparison and CIDRMATCH query shapes keep working unchanged — the UDT is invisible to Calcite's planner-internal coercion machinery.

Background

OpenSearchSchemaBuilder (in the OpenSearch sandbox) used to map both ip and binary field types to plain SqlTypeName.VARBINARY. That collapse caused three downstream bugs in this plugin's PPL path through the analytics engine:

The DataFusion backend ships back a 16-byte ipv4-mapped-ipv6 buffer; AnalyticsExecutionEngine.convertRows calls ExprValueUtils.fromObjectValue(byte[]) which has no byte[] case → unsupported object class [B.
The response schema reports "type": "binary" for an IP column (because convertRelDataTypeToExprType(VARBINARY) → BINARY ExprType).
CIDRMATCH against a column was registered with PPLTypeChecker.family(SqlTypeFamily.BINARY, SqlTypeFamily.STRING), with the byte-range expansion living inline at the registration site.

The companion OpenSearch PR introduces IpType / BinaryType (Calcite UDTs that extend AbstractSqlType with VARBINARY underneath) and moves CIDRMATCH dispatch into a backend ScalarFunctionAdapter.

Approach

Two response-boundary changes plus one cleanup, all minimal:

1. Result-column UDT recognition (OpenSearchTypeFactory).
New sibling function convertAnalyticsEngineRelDataTypeToExprType does an instanceof dispatch:

if (type instanceof IpType) return IP;
if (type instanceof BinaryType) return BINARY;
return convertRelDataTypeToExprType(type);

The original convertRelDataTypeToExprType is deliberately unchanged — Calcite's coercion machinery round-trips through it, so returning IP ExprType for a VARBINARY column synthesizes IP(string) casts that DataFusion can't resolve. Keeping UDT recognition in a sibling function isolates it to the response-schema path.

2. Per-column UDT dispatch in row conversion (AnalyticsExecutionEngine.convertRows).
New static helper toExprValue(Object value, RelDataType type):

byte[] + IpType → InetAddresses.toAddrString(InetAddress.getByAddress(bytes)) (matches IpFieldMapper.valueFetcher semantics: dotted-quad for IPv4 / IPv4-mapped, RFC 5952 for pure IPv6).
byte[] + BinaryType → Base64.getEncoder().encodeToString(bytes) (matches the OpenSearch binary field wire contract).
Otherwise → ExprValueUtils.fromObjectValue(value) unchanged.

UnknownHostException from a malformed IP buffer falls through to the default handler so the user sees a clear error rather than a malformed address string.

3. CIDRMATCH cleanup (PPLFuncImpTable).

Removes udf/ip/CidrMatchAdapter and its inline registration. CIDRMATCH dispatch now lives in CidrMatchFunctionAdapter on the backend (in the companion OpenSearch PR), which means it serves both the production SQL-plugin variant and the sandbox test front-end with a single implementation.
Collapses the two-line registration (typecheck + withRexBuilderShim) to a single typecheck-only registerOperator(CIDRMATCH, PPLBuiltinOperators.CIDRMATCH).
The runtime CidrMatchFunction UDF stays as the dynamic last-resort fallback.

Testing

Unit tests added:

OpenSearchTypeFactoryTest:
- testConvertResultColumnIpTypeReturnsIpExprType — IpType → ExprCoreType.IP.
- testConvertResultColumnBinaryTypeReturnsBinaryExprType — BinaryType → ExprCoreType.BINARY.
- testConvertResultColumnPlainVarbinaryFallsBackToBinary — plain VARBINARY (no UDT) keeps returning BINARY.
- testConvertResultColumnDelegatesParityForNonUdtTypes — for every non-UDT RelDataType, the result-column variant must agree with the planner-internal variant. Drift here would mean response-schema labels diverge from what Calcite's coercion sees.
AnalyticsExecutionEngineTest:
- executeRelNode_ipColumnRendersAsAddressString — both ipv4-mapped IPv6 (::ffff:1.2.3.4 → "1.2.3.4") and pure IPv6 (::1) buffers render to the right canonical form, schema reports "ip".
- executeRelNode_binaryColumnRendersAsBase64 — byte[] payload base64-encodes to match OpenSearch wire format, schema reports "binary".

Existing tests still pass. [B-class regression caught at: physicalPlanExecute_callsOnFailure (already in the file, exercises the same converter dispatch).

End-to-end (against a single-node gradle run cluster with the companion OpenSearch PR applied):

cast(host as STRING) returns "1.2.3.4" / "::1" (was a 16-byte garbage buffer).
cast(blob as STRING) matches | fields blob (base64-encoded).
where host = '1.2.3.4' coerces cleanly with no Unable to convert call IP(string) regression.
cidrmatch(host, '1.2.3.0/24') (column form) and CIDRMATCH('1.2.3.4', '1.2.3.0/24') (literal form) both return correct row counts via the backend adapter.
Response schema labels: "type": "ip" for ip columns, "type": "binary" for binary columns.

github-actions · 2026-05-22T17:01:52Z

PR Reviewer Guide 🔍

(Review updated until commit `0918d6b`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Possible Issue The loop at line 131 iterates over `fields.size()` but only accesses `row[i]` when `i < row.length`. If `fields.size() > row.length`, the loop continues but `value` is set to `null` for indices beyond `row.length`. However, `valueMap.put(field.getName(), toExprValue(value, field.getType()))` is never called for these iterations because the put statement is outside the loop body shown. This appears to be incomplete code that may skip populating fields when row length is shorter than field count. for (int i = 0; i < fields.size(); i++) { RelDataTypeField field = fields.get(i); Object value = (i < row.length) ? row[i] : null; valueMap.put(field.getName(), toExprValue(value, field.getType())); } Exception Handling When `InetAddress.getByAddress(bytes)` throws `UnknownHostException` at line 162, the catch block wraps it in `IllegalStateException` with a message about invalid buffer length. However, `UnknownHostException` is only thrown by `getByAddress` when the byte array length is not 4 or 16. If the analytics engine returns a buffer of unexpected length (e.g., due to a backend bug or data corruption), this will surface as an `IllegalStateException` during result conversion, potentially causing query failures. Consider whether this should be handled more gracefully or logged with additional context about the source column. try { return ExprValueUtils.stringValue( InetAddresses.toAddrString(InetAddress.getByAddress(bytes))); } catch (UnknownHostException e) { throw new IllegalStateException("invalid IP buffer length: " + bytes.length, e); }

github-actions · 2026-05-22T17:02:09Z

PR Code Suggestions ✨

Latest suggestions up to 0918d6b

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Validate IP buffer length explicitly The `UnknownHostException` is thrown only for invalid buffer lengths (not 4 or 16 bytes), but the error message doesn't clarify the expected lengths. Add validation to check buffer length explicitly before calling `InetAddress.getByAddress()` and provide a more descriptive error message indicating valid lengths (4 or 16 bytes). core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [157-170] private static ExprValue toExprValue(Object value, RelDataType type) { if (value instanceof byte[] bytes) { if (type instanceof IpType) { + if (bytes.length != 4 && bytes.length != 16) { + throw new IllegalStateException( + "invalid IP buffer length: " + bytes.length + " (expected 4 or 16 bytes)"); + } try { return ExprValueUtils.stringValue( InetAddresses.toAddrString(InetAddress.getByAddress(bytes))); } catch (UnknownHostException e) { - throw new IllegalStateException("invalid IP buffer length: " + bytes.length, e); + throw new IllegalStateException("failed to parse IP address", e); } } else if (type instanceof BinaryType) { return ExprValueUtils.stringValue(Base64.getEncoder().encodeToString(bytes)); } } return ExprValueUtils.fromObjectValue(value); } Suggestion importance[1-10]: 7 __ Why: The suggestion improves error handling by explicitly validating IP buffer lengths before processing, providing clearer error messages. However, `InetAddress.getByAddress()` already validates buffer length internally, so this is primarily a defensive programming improvement rather than fixing a critical bug.	Medium
General	Log warning for row-field mismatch When `row.length < fields.size()`, the code defaults missing values to `null`. However, this silently masks potential data corruption or schema mismatches. Consider logging a warning when row length doesn't match field count to aid debugging of upstream data issues. core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [131-135] +if (row.length != fields.size()) { + logger.warn("Row length mismatch: expected {} fields but got {} values", + fields.size(), row.length); +} for (int i = 0; i < fields.size(); i++) { RelDataTypeField field = fields.get(i); Object value = (i < row.length) ? row[i] : null; valueMap.put(field.getName(), toExprValue(value, field.getType())); } Suggestion importance[1-10]: 6 __ Why: Adding logging for row-field mismatches would help detect data corruption or schema issues during debugging. However, the existing code already handles this case gracefully by defaulting to `null`, and the suggestion only adds observability without changing behavior or fixing a bug.	Low

Previous suggestions

Suggestions up to commit b7fd95b

Category	Suggestion	Impact
General	Log invalid IP buffer errors The `UnknownHostException` catch block silently falls through to default handling, which may throw an unclear error. Consider logging the exception with the invalid buffer length to aid debugging when the backend sends malformed IP data. core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [158-167] if (value instanceof byte[] bytes) { if (type instanceof IpType) { try { return ExprValueUtils.stringValue( InetAddresses.toAddrString(InetAddress.getByAddress(bytes))); } catch (UnknownHostException e) { - // Defensive: backend gave us a buffer that isn't 4 or 16 bytes. Fall through to - // the default handling so the user sees a clear error rather than a malformed - // address string. + // Log the error with buffer length for debugging + logger.warn("Invalid IP buffer length: {} bytes, expected 4 or 16", bytes.length, e); + // Fall through to default handling } } else if (type instanceof BinaryType) { return ExprValueUtils.stringValue(Base64.getEncoder().encodeToString(bytes)); } } Suggestion importance[1-10]: 5 __ Why: Adding logging for the `UnknownHostException` would help with debugging malformed IP buffers from the backend. However, the current fallthrough behavior is intentional and documented, and the default handling will already surface an error to the user. The logging is a minor enhancement rather than a critical fix.	Low

Suggestions up to commit 540074b

Category	Suggestion	Impact
General	Log invalid IP byte arrays The `UnknownHostException` catch block silently falls through to default handling, which may throw an unclear error. Consider logging the exception with context (e.g., buffer length) to aid debugging when invalid IP byte arrays are encountered. core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [158-166] if (value instanceof byte[] bytes) { if (type instanceof IpType) { try { return ExprValueUtils.stringValue(InetAddresses.toAddrString(InetAddress.getByAddress(bytes))); } catch (UnknownHostException e) { - // Defensive: backend gave us a buffer that isn't 4 or 16 bytes. Fall through to - // the default handling so the user sees a clear error rather than a malformed - // address string. + // Log the invalid buffer for debugging + logger.warn("Invalid IP address byte array of length {}: {}", bytes.length, e.getMessage()); + // Fall through to default handling } } else if (type instanceof BinaryType) { return ExprValueUtils.stringValue(Base64.getEncoder().encodeToString(bytes)); } } Suggestion importance[1-10]: 5 __ Why: Adding logging for invalid IP byte arrays would help with debugging, but the current defensive approach of falling through to default handling is already reasonable. The suggestion improves observability without fixing a critical issue. The score reflects a moderate improvement in maintainability and debugging capability.	Low

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

github-actions · 2026-05-22T18:22:52Z

Persistent review updated to latest commit b7fd95b

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

github-actions · 2026-05-22T20:23:41Z

Persistent review updated to latest commit 0918d6b

vinaykpud requested review from LantaoJin, RyanL1997, Swiddis, acarbonetto, ahkcs, anirudha, dai-chen, joshuali925, mengweieric, noCharger, penghuo, ps48, qianheng-aws, songkant-aws, vamsimanohar, ykmr1224 and yuancu as code owners May 22, 2026 17:00

vinaykpud marked this pull request as draft May 22, 2026 17:00

penghuo reviewed May 22, 2026

View reviewed changes

Comment thread core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java

Added UDT for IP and Binary support

b7fd95b

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

vinaykpud force-pushed the feature/udt_ip_binary branch from 540074b to b7fd95b Compare May 22, 2026 18:21

vinaykpud mentioned this pull request May 22, 2026

Added UDT for IP and Binary support opensearch-project/OpenSearch#21807

Open

refactored cidr and updated comments

0918d6b

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

vinaykpud marked this pull request as ready for review May 22, 2026 20:22

penghuo approved these changes May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added UDT for IP and Binary support#5463

Added UDT for IP and Binary support#5463
vinaykpud wants to merge 2 commits into
opensearch-project:mainfrom
vinaykpud:feature/udt_ip_binary

vinaykpud commented May 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vinaykpud commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Background

Approach

Testing

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 0918d6b)

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vinaykpud commented May 22, 2026 •

edited

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading

(Review updated until commit `0918d6b`)

github-actions Bot commented May 22, 2026 •

edited

Loading