feat: GeoArrow support for bulk ingestion (GEOGRAPHY/GEOMETRY) by jatorre · Pull Request #99 · adbc-drivers/snowflake

jatorre · 2026-03-18T07:33:59Z

Summary

Adds geospatial column support to the Snowflake ADBC driver's bulk ingestion path. When Arrow columns carry geoarrow.wkb or geoarrow.wkt extension metadata, the driver automatically creates GEOGRAPHY or GEOMETRY columns in Snowflake and converts the data.

Detects geoarrow columns from ARROW:extension:name field metadata (handles C Data Interface where Go-level extension types are stripped)
New statement option adbc.snowflake.statement.ingest_geo_type: "geography" (default, WGS84/4326) or "geometry" (any SRID)
Extracts SRID from geoarrow CRS metadata (PROJJSON or "EPSG:NNNN" format) for GEOMETRY columns
Unit tests for type mapping and SRID extraction

How it works

Detect geoarrow columns from Arrow extension metadata
Create table with native GEOGRAPHY/GEOMETRY columns
COPY INTO with transform: TO_GEOGRAPHY($1:"geom"::BINARY, true) converts WKB to GEOGRAPHY inline during COPY — no post-processing needed
For GEOMETRY columns, SRID is applied via ST_SETSRID if present in geoarrow metadata

Why COPY transform?

Snowflake's COPY INTO from Parquet cannot load WKB directly into GEOGRAPHY/GEOMETRY columns — only CSV and JSON/AVRO support direct geospatial loading from stages (docs).

The initial approach used a post-COPY CTAS pattern (rename → CREATE TABLE AS SELECT with conversion → drop staging). This PR replaces that with a COPY transform that applies TO_GEOGRAPHY/TO_GEOMETRY in the SELECT clause of the COPY subquery, eliminating 3 SQL round-trips and a full table rewrite.

The original CTAS path is preserved as fallback for schemas without geo columns.

Benchmark results

COPY transform vs CTAS approach (50K random points, median of 10 runs):

                       Median      P25      P75      Min      Max   Rows/sec   N
─────────────────────────────────────────────────────────────────────────────────
CTAS (old approach)     8.13s    8.11s    8.26s    7.99s    8.85s      6,150   7
COPY transform (this)   6.11s    5.81s    6.15s    5.34s    7.44s      8,183  10

Speedup (median): 1.33x  (6,150 → 8,183 rows/sec)

The COPY transform also had zero transient failures vs 3/10 for the CTAS path (fewer SQL round-trips = fewer timeout opportunities).

End-to-end with real-world data (Czech Republic OSM Geofabrik):

Dataset	Rows	Throughput	Geometry type
POIs	465,280	38,119 rows/sec	Point
Roads	1,885,651	56,804 rows/sec	LineString
Buildings	5,014,886	68,611 rows/sec	Polygon

At scale (500K rows, single runs):

Dataset	CTAS (old)	COPY transform	Speedup
500K points	29,499 rows/sec	39,339 rows/sec	1.33x
500K polygons	25,189 rows/sec	28,074 rows/sec	1.11x

Export (not in this PR)

Export/read-path geoarrow support is in a separate PR (#100). Detecting GEOGRAPHY/GEOMETRY columns on the read path is non-trivial because:

With GEOGRAPHY_OUTPUT_FORMAT=EWKB, srcMeta.Type becomes "binary" (type info lost)
With default GeoJSON format, srcMeta.Type is "object" (same as VARIANT/OBJECT)

Context

This is part of a broader effort to add GeoArrow support across ADBC drivers. Previously opened as apache/arrow-adbc#4114, moved here per maintainer request.

Test plan

Unit tests for toSnowflakeType with geoarrow extension types
Unit tests for extractSRIDFromMeta (PROJJSON, simple EPSG string, null, empty, invalid)
Existing TestIngestBatchedParquetWithFileLimit still passes
End-to-end tested against real Snowflake with points, lines, and polygons
Verified GEOGRAPHY column type created in Snowflake via INFORMATION_SCHEMA
Benchmarked COPY transform vs CTAS approach (10 iterations, median comparison)

Add transparent geometry import via geoarrow.wkb/wkt extension types. The driver detects geoarrow columns in Arrow metadata and converts them to Snowflake GEOGRAPHY or GEOMETRY using a COPY transform with inline TO_GEOGRAPHY/TO_GEOMETRY conversion. How it works: 1. Detect geoarrow.wkb/wkb_view/wkt/wkt_view from Arrow extension types or ARROW:extension:name field metadata (C Data Interface) 2. Create table with native GEOGRAPHY/GEOMETRY columns 3. COPY INTO with transform: TO_GEOGRAPHY($1:"geom"::BINARY, true) converts WKB to GEOGRAPHY inline during COPY — no post-processing Statement option: adbc.snowflake.statement.ingest_geo_type = "geography" (default) | "geometry" GEOGRAPHY is always WGS84 (SRID 4326). GEOMETRY supports any SRID, extracted from geoarrow CRS metadata (PROJJSON or "EPSG:NNNN"). The COPY transform approach is ~1.33x faster than the alternative rename+CTAS+drop pattern because it eliminates 3 SQL round-trips and a full table rewrite: 50K points (median, 10 runs): 8.13s → 6.11s (6,150 → 8,183 rows/sec) 500K points: 16.95s → 12.71s (29,499 → 39,339 rows/sec) 500K polygons: 19.85s → 17.81s (25,189 → 28,074 rows/sec)

jatorre requested review from lidavidm and zeroshade as code owners March 18, 2026 07:33

jatorre had a problem deploying to Snowflake CI March 19, 2026 15:18 — with GitHub Actions Error

jatorre force-pushed the geoarrow-support branch from 06d7ea0 to 242f3e5 Compare April 12, 2026 05:05

jatorre temporarily deployed to Snowflake CI April 13, 2026 21:22 — with GitHub Actions Inactive

jatorre had a problem deploying to Snowflake CI April 13, 2026 21:22 — with GitHub Actions Failure

jatorre temporarily deployed to Snowflake CI April 13, 2026 21:22 — with GitHub Actions Inactive

jatorre deployed to Snowflake CI April 13, 2026 21:22 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GeoArrow support for bulk ingestion (GEOGRAPHY/GEOMETRY)#99

feat: GeoArrow support for bulk ingestion (GEOGRAPHY/GEOMETRY)#99
jatorre wants to merge 1 commit intoadbc-drivers:mainfrom
jatorre:geoarrow-support

jatorre commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jatorre commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Why COPY transform?

Benchmark results

Export (not in this PR)

Context

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jatorre commented Mar 18, 2026 •

edited

Loading