Skip to content

[PECOBLR-1377] SRID in geospatial column type name#1157

Merged
sreekanth-db merged 11 commits intomainfrom
geospatial-enhancements
Jan 21, 2026
Merged

[PECOBLR-1377] SRID in geospatial column type name#1157
sreekanth-db merged 11 commits intomainfrom
geospatial-enhancements

Conversation

@sreekanth-db
Copy link
Copy Markdown
Collaborator

@sreekanth-db sreekanth-db commented Dec 22, 2025

Description

This PR enhances geospatial datatype handling to include SRID (Spatial Reference System Identifier) information in column type names and fixes multiple issues related to complex datatype handling across different result formats.

Key Changes

  1. Geospatial Type Name Enhancement

    • Column type names now include SRID: GEOMETRY(4326) instead of GEOMETRY
    • Applies to both GEOMETRY and GEOGRAPHY types
    • Preserves full type information in metadata for better type identification
  2. SEA Inline Mode Complex Type Fix

    • Fixed issue where complex types (ARRAY, MAP, STRUCT) were not returned as complex objects in SEA Inline mode (JSON array result format)
    • Now properly converts to complex datatype objects when EnableComplexDatatypeSupport=true
  3. Thrift CloudFetch Metadata Enhancement

    • Fixed error when extracting type details (e.g., INT from ARRAY<INT>) in Thrift CloudFetch mode
    • Enhanced getColumnInfoFromTColumnDesc() to use Arrow schema metadata alongside TColumnDesc
    • Arrow schema provides complete type information (e.g., ARRAY<INT>) while TColumnDesc only contains base type (e.g., ARRAY)
  4. Arrow Metadata Extraction

    • Added DatabricksThriftUtil.getArrowMetadata() to deserialize Arrow schema from TGetResultSetMetadataResp
    • Fixed null arrow metadata issue in DatabricksResultSet constructor for Thrift CloudFetch mode

Testing

Unit Tests

  • All existing unit tests pass and additional tests are added for new methods

Integration Tests

  • GeospatialTests.java - Comprehensive E2E integration test
    • Tests geospatial types (GEOMETRY and GEOGRAPHY)
    • Validates 24 configuration combinations:
      • Protocol: Thrift / SEA
      • Serialization: Arrow / Inline
      • CloudFetch: Enabled / Disabled (only with Arrow, as CloudFetch requires Arrow)
      • GeoSpatial Support: Enabled / Disabled
      • Complex Type Support: Enabled / Disabled
    • Validates metadata: column types, type names, class names
    • Validates values: WKT representation, SRID
    • Validates behavior when geospatial objects are enabled vs. disabled (STRING fallback)
    • All 24 tests pass

Additional Notes to the Reviewer

Other required details are mentioned in comments in the diff

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
return parser.parseJsonStringToDbStruct(object.toString(), arrowMetadata);
}

private static AbstractDatabricksGeospatial convertToGeospatial(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing duplicate code, using the same logic present in GeospatialConverter

long rowSize = executionResult.getRowCount();
List<String> arrowMetadata = null;
if (executionResult instanceof ArrowStreamResult) {
arrowMetadata = ((ArrowStreamResult) executionResult).getArrowMetadata();
Copy link
Copy Markdown
Collaborator Author

@sreekanth-db sreekanth-db Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

((ArrowStreamResult) executionResult).getArrowMetadata() will always be null for cloud fetch mode at this point in the code flow. So getting arrow metadata from TGetResultSetMetadataResp

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will getResultSetMetadata always be present?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should have TGetResultSetMetadataResp object in TFetchResultsResp. Anyways we are handling this case in getArrowMetadata method to check for null values

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is arrow metadata consistently populated in TGetResultSetMetadataResp?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not when inline result. But that is limitation of complex data types as well

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not when inline result

Arrow metadata is populated from thrift always when format is CSV/JSON_ARRAY or ARROW.

The only time arrow metadata is not present is for when format is HIVE.
Is inline result in hive format ?

* @return true if the type name starts with ARRAY, MAP, STRUCT, GEOMETRY, or GEOGRAPHY, false
* otherwise
*/
private static boolean isComplexType(String typeName) {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving this method to DatabricksTypeUtil for using in other classes

return complexDatatypeSupport ? obj : obj.toString();
}

private Object handleComplexDataTypesForSEAInline(Object obj, String columnName)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating this method to always return complex datatype, if complex mode is disabled, caller will convert to string

typeText = "STRING";
}

// store base type eg. DECIMAL instead of DECIMAL(7,2) except for geospatial datatypes
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For geospatial, we want SRID info (in paranthesis) to be present. eg: GEOMETRY(4326)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a case that SRID info might be missing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, even for SRID 0, we get GEOMETRY(0)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

srid will always be present

columnIndex++) {
TColumnDesc columnDesc = resultManifest.getSchema().getColumns().get(columnIndex);

ColumnInfo columnInfo = getColumnInfoFromTColumnDesc(columnDesc);
Copy link
Copy Markdown
Collaborator Author

@sreekanth-db sreekanth-db Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TColumnDesc doesn't have complete information about the column metadata, hence enhancing it using the arrowMetadata extracted from the TGetResultSetMetadataResp. This is required for methods which use ColumnInfo later in the flow (thrift + cloudfetch + complex)

.columnTypeClassName("java.lang.String")
.columnType(Types.OTHER)
.columnTypeText(VARIANT);
} else if (isGeometryColumn(arrowMetadata, columnIndex)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled with updates to the column info, no separate handling is required

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
@sreekanth-db sreekanth-db changed the title geospatial column type name with srid SRID in geospatial column type name Dec 22, 2025
@sreekanth-db sreekanth-db changed the title SRID in geospatial column type name [PECOBLR-1377] SRID in geospatial column type name Dec 30, 2025

if (arrowMetadata != null && isComplexType(arrowMetadata)) {
typeText = arrowMetadata;
if (arrowMetadata.startsWith(GEOMETRY)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to check if Geospatial flag is enabled?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an inner layer, so flag check is not required here. we are checking for geospatial flags in DatabricksResultSetMetaData which is the interface through which users get metadata info

.map(e -> e.get(ARROW_METADATA_KEY))
.collect(Collectors.toList());
} catch (IOException e) {
throw new DatabricksSQLException(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add logging

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added error log

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
Copy link
Copy Markdown
Collaborator

@vikrantpuppala vikrantpuppala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! can we also run the jdbc comparator in different modes (sea/arrow) across different column types to ensure things are consistent

Comment thread NEXT_CHANGELOG.md Outdated

### Fixed
- Fixed complex types not being returned as objects in SEA Inline mode when `EnableComplexDatatypeSupport=true`.
- Fixed errors with complex data types in Thrift CloudFetch mode.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we be more descriptive on what these errors are?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the change log with more details

long rowSize = executionResult.getRowCount();
List<String> arrowMetadata = null;
if (executionResult instanceof ArrowStreamResult) {
arrowMetadata = ((ArrowStreamResult) executionResult).getArrowMetadata();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is arrow metadata consistently populated in TGetResultSetMetadataResp?

return parser.parseJsonStringToDbStruct(obj.toString(), columnName);
} else if (columnName.startsWith(GEOMETRY)) {
return obj;
return new GeospatialConverter().toDatabricksGeometry(obj);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is toDatabricksGeometry not a static method? why do we need to init an object to use it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the code to use cached converter similar to other datatype's converters

long rowSize = executionResult.getRowCount();
List<String> arrowMetadata = null;
if (executionResult instanceof ArrowStreamResult) {
arrowMetadata = ((ArrowStreamResult) executionResult).getArrowMetadata();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not when inline result

Arrow metadata is populated from thrift always when format is CSV/JSON_ARRAY or ARROW.

The only time arrow metadata is not present is for when format is HIVE.
Is inline result in hive format ?

return parser.parseJsonStringToDbStruct(obj.toString(), columnName);
} else if (columnName.startsWith(GEOMETRY)) {
return obj;
return new GeospatialConverter().toDatabricksGeometry(obj);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why didnt we handle this before ?
what is the impact of not having this in 3.0.6 ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently complex data types are not supported in SEA inline flow, so geospatial type is returned as a string (till 3.0.7). This is consistent with our documentation. With this change we are supporting complex types for SEA inline mode as well. Will update the documentation accordingly.

typeText = "STRING";
}

// store base type eg. DECIMAL instead of DECIMAL(7,2) except for geospatial datatypes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

srid will always be present

* and EnableComplexDatatypeSupport are enabled AND not in Thrift+Inline mode. Otherwise, returns as
* STRING.
*/
public class GeospatialTests {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets test geometry any as well:
example query:

%sql
SELECT * FROM VALUES
(ST_GeomFromText('POINT(17 7)', 4326)),
(ST_GeomFromText('POINT(5 5)', 0)) AS t(geom)

expected type name: GEOMETRY(ANY) expected SRIDs per row: 4326 and 0. lets assert on everything, that both row level values are correct and column metadata is correct

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the tests for GEOMETRY(ANY)

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
@sreekanth-db sreekanth-db enabled auto-merge (squash) January 21, 2026 18:32
@sreekanth-db sreekanth-db disabled auto-merge January 21, 2026 18:44
@sreekanth-db sreekanth-db merged commit 809c0b3 into main Jan 21, 2026
13 of 15 checks passed
@sreekanth-db sreekanth-db deleted the geospatial-enhancements branch January 21, 2026 18:46
private void setColumnInfo(TGetResultSetMetadataResp resultManifest)
throws DatabricksSQLException {
columnInfos = new ArrayList<>();
List<String> arrowMetadataList = DatabricksThriftUtil.getArrowMetadata(resultManifest);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when this is null ?
Some very old spark servers do not return arrowMetadata, we do not want them to break

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the method DatabricksThriftUtil.getArrowMetadata, we check if arrowSchema is present in TGetResultSetMetadataResp, and return null if required. At every code point where we are accessing the arrowMetadataList, we check if it is not null and the size of the list before accessing the element so that existing flows doesn't break. We have multi DBR tests to ensure the backward compatibility.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of geospatial datatypes, if arrowMetadata is null (won't happen in ideal flow, since driver would always get arrowMetadata in DBR 17.1+ from which geospatial types are supported), the column type is shown as String (since the TColumnDesc contains STRING and driver doesn't have other information to identify it as GEOMETRY/GEOGRAPHY)

columnInfos.add(
com.databricks.jdbc.common.util.DatabricksThriftUtil.getColumnInfoFromTColumnDesc(
tColumnDesc));
List<String> arrowMetadata = DatabricksThriftUtil.getArrowMetadata(resultManifest);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when this is null ?
Some very old spark servers do not return arrowMetadata, we do not want them to break

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have tests when thrift protocol is low ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the method DatabricksThriftUtil.getArrowMetadata, we check if arrowSchema is present in TGetResultSetMetadataResp, and return null if required. At every code point where we are accessing the arrowMetadataList, we check if it is not null and the size of the list before accessing the element so that existing flows doesn't break. We have multi DBR tests to ensure the backward compatibility.

if (arrowMetadata != null && isComplexType(arrowMetadata)) {
typeText = arrowMetadata;
if (arrowMetadata.startsWith(GEOMETRY)) {
columnInfoTypeName = ColumnInfoTypeName.GEOMETRY;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this return column info in parenthesis ? e.g. GEOMETRY(4326)?

Copy link
Copy Markdown
Collaborator Author

@sreekanth-db sreekanth-db Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columnInfoTypeName is an enum, so we can't have SRID info here. But this is just an intermediate processing step, not the ultimate response that user receives. User response includes SRID info. E2E behaviour can be found in this test (from line no: 211)

@sreekanth-db sreekanth-db mentioned this pull request Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants