Skip to content

S3 Source Connector #24

@kmacrow

Description

@kmacrow

✨ Enhancement Request

Summary:
Add an S3 source connector that supports syncing structured data formats from S3.


Problem / Use Case:
I have an S3 data lake and I want to use those tables as multi-tenant or single-tenant data sources for models in Pontoon.


Proposed Solution:

  • Start with support for traditional Hive partitioning and compressed Parquet files, e.g. s3://my-bucket/<namespace>/<schema>/<table>/<tenant-id=abc>/date=2025-01-01/
  • Add support for additional formats: JSON/NDJSON, ORC, Avro
  • Add support for reading transactional table formats: Iceberg, Delta, Hudi, S3 Tables

Alternatives Considered:

  • No viable workarounds/alternatives right now

Impact / Importance:

  • High impact

Additional Context (optional):

  • Part of a series of enhancements on supporting object connectors as sources and destinations

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions