feat: GeoArrow support for bulk ingestion (GEOGRAPHY/GEOMETRY)#99
Open
jatorre wants to merge 1 commit intoadbc-drivers:mainfrom
Open
feat: GeoArrow support for bulk ingestion (GEOGRAPHY/GEOMETRY)#99jatorre wants to merge 1 commit intoadbc-drivers:mainfrom
jatorre wants to merge 1 commit intoadbc-drivers:mainfrom
Conversation
This was referenced Mar 18, 2026
Add transparent geometry import via geoarrow.wkb/wkt extension types.
The driver detects geoarrow columns in Arrow metadata and converts them
to Snowflake GEOGRAPHY or GEOMETRY using a COPY transform with inline
TO_GEOGRAPHY/TO_GEOMETRY conversion.
How it works:
1. Detect geoarrow.wkb/wkb_view/wkt/wkt_view from Arrow extension
types or ARROW:extension:name field metadata (C Data Interface)
2. Create table with native GEOGRAPHY/GEOMETRY columns
3. COPY INTO with transform: TO_GEOGRAPHY($1:"geom"::BINARY, true)
converts WKB to GEOGRAPHY inline during COPY — no post-processing
Statement option:
adbc.snowflake.statement.ingest_geo_type = "geography" (default) | "geometry"
GEOGRAPHY is always WGS84 (SRID 4326). GEOMETRY supports any SRID,
extracted from geoarrow CRS metadata (PROJJSON or "EPSG:NNNN").
The COPY transform approach is ~1.33x faster than the alternative
rename+CTAS+drop pattern because it eliminates 3 SQL round-trips and
a full table rewrite:
50K points (median, 10 runs): 8.13s → 6.11s (6,150 → 8,183 rows/sec)
500K points: 16.95s → 12.71s (29,499 → 39,339 rows/sec)
500K polygons: 19.85s → 17.81s (25,189 → 28,074 rows/sec)
06d7ea0 to
242f3e5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds geospatial column support to the Snowflake ADBC driver's bulk ingestion path. When Arrow columns carry
geoarrow.wkborgeoarrow.wktextension metadata, the driver automatically creates GEOGRAPHY or GEOMETRY columns in Snowflake and converts the data.ARROW:extension:namefield metadata (handles C Data Interface where Go-level extension types are stripped)adbc.snowflake.statement.ingest_geo_type:"geography"(default, WGS84/4326) or"geometry"(any SRID)"EPSG:NNNN"format) for GEOMETRY columnsHow it works
TO_GEOGRAPHY($1:"geom"::BINARY, true)converts WKB to GEOGRAPHY inline during COPY — no post-processing neededST_SETSRIDif present in geoarrow metadataWhy COPY transform?
Snowflake's COPY INTO from Parquet cannot load WKB directly into GEOGRAPHY/GEOMETRY columns — only CSV and JSON/AVRO support direct geospatial loading from stages (docs).
The initial approach used a post-COPY CTAS pattern (rename → CREATE TABLE AS SELECT with conversion → drop staging). This PR replaces that with a COPY transform that applies
TO_GEOGRAPHY/TO_GEOMETRYin the SELECT clause of the COPY subquery, eliminating 3 SQL round-trips and a full table rewrite.The original CTAS path is preserved as fallback for schemas without geo columns.
Benchmark results
COPY transform vs CTAS approach (50K random points, median of 10 runs):
The COPY transform also had zero transient failures vs 3/10 for the CTAS path (fewer SQL round-trips = fewer timeout opportunities).
End-to-end with real-world data (Czech Republic OSM Geofabrik):
At scale (500K rows, single runs):
Export (not in this PR)
Export/read-path geoarrow support is in a separate PR (#100). Detecting GEOGRAPHY/GEOMETRY columns on the read path is non-trivial because:
GEOGRAPHY_OUTPUT_FORMAT=EWKB,srcMeta.Typebecomes"binary"(type info lost)srcMeta.Typeis"object"(same as VARIANT/OBJECT)Context
This is part of a broader effort to add GeoArrow support across ADBC drivers. Previously opened as apache/arrow-adbc#4114, moved here per maintainer request.
Test plan
toSnowflakeTypewith geoarrow extension typesextractSRIDFromMeta(PROJJSON, simple EPSG string, null, empty, invalid)TestIngestBatchedParquetWithFileLimitstill passes