feat(snowflake): GeoArrow support for bulk ingestion (GEOGRAPHY/GEOMETRY) by jatorre · Pull Request #4114 · apache/arrow-adbc

jatorre · 2026-03-17T22:34:13Z

Summary

Adds geospatial column support to the Snowflake ADBC driver's bulk ingestion path. When Arrow columns carry geoarrow.wkb or geoarrow.wkt extension metadata, the driver automatically creates GEOGRAPHY or GEOMETRY columns in Snowflake and converts the data.

Detects geoarrow columns from ARROW:extension:name field metadata (handles C Data Interface where Go-level extension types are stripped)
New statement option adbc.snowflake.statement.ingest_geo_type: "geography" (default, WGS84/4326) or "geometry" (any SRID)
Extracts SRID from geoarrow CRS metadata (PROJJSON or "EPSG:NNNN" format) for GEOMETRY columns
Unit tests for type mapping and SRID extraction

How it works

Bulk ingest loads data as BINARY via the existing Parquet → PUT → COPY INTO pipeline (unchanged)
After COPY, geoarrow columns are detected and converted via CTAS with TO_GEOGRAPHY/TO_GEOMETRY
For GEOMETRY columns, SRID is applied via ST_SETSRID if present in geoarrow metadata

Why CTAS instead of direct COPY INTO GEOGRAPHY?

Snowflake's COPY INTO from Parquet cannot load WKB directly into GEOGRAPHY/GEOMETRY columns — only CSV and JSON/AVRO support direct geospatial loading from stages (docs). The CTAS workaround (rename → CTAS with conversion → drop staging) adds minimal overhead at scale.

A future optimization could use COPY transforms (SELECT ... FROM @stage) to convert inline.

Benchmark results

Tested with Czech Republic OSM Geofabrik data (real-world geometries):

Dataset	Rows	Throughput	Geometry type
POIs	465,280	38,119 rows/sec	Point
Roads	1,885,651	56,804 rows/sec	LineString
Buildings	5,014,886	68,611 rows/sec	Polygon

This is a 4.4x improvement over the previous approach (WKT string + staging table + server-side TRY_TO_GEOGRAPHY, ~8,600 rows/sec) and approaches SnowSQL staging performance (79K rows/sec) without needing the SnowSQL CLI.

Context

This is part of a broader effort to add GeoArrow support across ADBC drivers for seamless geospatial data transfer between DuckDB and cloud data warehouses. Related work:

Databricks: databricks-sql-go#328, adbc-drivers/databricks#350
Redshift: adbc-drivers/redshift#2
Umbrella issue: GeoArrow Support? #2098

Test plan

Unit tests for toSnowflakeType with geoarrow extension types
Unit tests for extractSRIDFromMeta (PROJJSON, simple EPSG string, null, empty, invalid)
Existing TestIngestBatchedParquetWithFileLimit still passes
End-to-end tested against real Snowflake with points, lines, and polygons
Verified GEOGRAPHY column type created in Snowflake via INFORMATION_SCHEMA
Integration test with GEOMETRY type + custom SRID (not yet tested against Snowflake)

🤖 Generated with Claude Code

…EOMETRY) Detect geoarrow.wkb/geoarrow.wkt columns during adbc_insert and create GEOGRAPHY or GEOMETRY columns in Snowflake, with automatic WKB→geo conversion and SRID support. How it works: 1. Bulk ingest loads data as BINARY via existing Parquet→PUT→COPY INTO 2. After COPY, geoarrow columns are detected from Arrow field metadata (ARROW:extension:name) and converted via CTAS with TO_GEOGRAPHY or TO_GEOMETRY. SRID is extracted from geoarrow CRS metadata (PROJJSON or "EPSG:NNNN") and applied via ST_SETSRID for GEOMETRY columns. The CTAS post-processing is needed because Snowflake's COPY INTO from Parquet cannot load WKB directly into GEOGRAPHY/GEOMETRY columns — only CSV and JSON/AVRO support direct geospatial loading from stages. See: https://docs.snowflake.com/en/sql-reference/data-types-geospatial#loading-geospatial-data-from-stages New statement option: - adbc.snowflake.statement.ingest_geo_type: "geography" (default) or "geometry". GEOGRAPHY is WGS84/SRID 4326; GEOMETRY supports any SRID. Benchmarked with Czech Republic OSM Geofabrik data against Snowflake: - Points (465K): 38,119 rows/sec - LineStrings (1.9M): 56,804 rows/sec - Polygons (5M): 68,611 rows/sec Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jatorre · 2026-03-18T07:29:46Z

Moving to adbc-drivers/snowflake per maintainer request. Will re-open there.

jatorre requested a review from zeroshade as a code owner March 17, 2026 22:34

jatorre closed this Mar 18, 2026

jatorre mentioned this pull request Mar 18, 2026

feat: GeoArrow support for bulk ingestion (GEOGRAPHY/GEOMETRY) adbc-drivers/snowflake#99

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(snowflake): GeoArrow support for bulk ingestion (GEOGRAPHY/GEOMETRY)#4114

feat(snowflake): GeoArrow support for bulk ingestion (GEOGRAPHY/GEOMETRY)#4114
jatorre wants to merge 1 commit into
apache:mainfrom
jatorre:snowflake-geoarrow-import

jatorre commented Mar 17, 2026

Uh oh!

jatorre commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jatorre commented Mar 17, 2026

Summary

How it works

Why CTAS instead of direct COPY INTO GEOGRAPHY?

Benchmark results

Context

Test plan

Uh oh!

jatorre commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant