Skip to content

docs: RFC — Keboola bez backendu — File-based architektura na S3 + DuckDB (TCRD-17)#373

Closed
ZdenekSrotyr wants to merge 1 commit intomainfrom
devin/1773841936-tcrd-17-file-based-architecture-s3-duckdb
Closed

docs: RFC — Keboola bez backendu — File-based architektura na S3 + DuckDB (TCRD-17)#373
ZdenekSrotyr wants to merge 1 commit intomainfrom
devin/1773841936-tcrd-17-file-based-architecture-s3-duckdb

Conversation

@ZdenekSrotyr
Copy link
Copy Markdown
Contributor

Linear issue: TCRD-17

Changes:

  • Adds an RFC document (in Czech) exploring a warehouse-less Keboola architecture using S3 (Parquet) + DuckDB as an alternative to Snowflake/BigQuery backends for cost-sensitive and small-data customers
  • Covers: current architecture analysis, proposed "File Backend" storage type, DuckDB transformation flow, metadata catalog options (custom DB vs. DuckLake vs. Iceberg), cost comparisons, performance breakpoints vs. warehouse, PoC pipeline design, risks, and a phased roadmap

Key areas for review:

  1. Repo placement — Document lives in a new rfc/ directory. Confirm this is appropriate for developers-docs (which is typically public-facing), or whether it belongs in an internal repo instead.
  2. Cost estimates (Section 9) — Based on public Snowflake pricing ($2/credit) and rough S3 cost calculations. Should be validated against actual Keboola customer spend data.
  3. DuckDB breakpoint claims (Section 7) — States ~100-200 GB as the crossover point vs. warehouse. These are extrapolated from public benchmarks, not Keboola-specific workloads.
  4. DuckLake recommendation (Section 6) — Recommends DuckLake for production metadata catalog. DuckLake is a relatively new project (2025) — warrants discussion on maturity risk.
  5. Storage API compatibility assumptions (Section 5) — Claims extractors/writers need no changes due to Common Interface abstraction. Needs validation from someone familiar with Storage API internals.

This is a research/exploration RFC, not a production code change. No functional impact on existing systems.

Link to Devin session: https://app.devin.ai/sessions/546413c56692420f8ab1ea9e3548a57f
Requested by: @ZdenekSrotyr

Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
@linear
Copy link
Copy Markdown

linear bot commented Mar 18, 2026

@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@devin-ai-integration devin-ai-integration bot deleted the devin/1773841936-tcrd-17-file-based-architecture-s3-duckdb branch March 18, 2026 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant