Skip to content

R2 Data Catalog Iceberg table corrupted (50408) after Pipeline sink metadata commits silently stall #12774

@MaorKer

Description

@MaorKer

What versions & operating system are you using?

  • wrangler 4.59.2
  • Windows 11 / Node.js v22
  • Pipeline created via wrangler pipelines CLI

Please provide a link to a minimal reproduction

No minimal repro repository — the issue requires sustained concurrent stream.send() calls over time. Full reproduction steps with a new pipeline are below.

Describe the Bug

R2 Data Catalog Iceberg table becomes permanently unqueryable (50408: Corrupted Catalog) after the Pipeline's sink-side metadata commit process stalls while compaction deletes the data files referenced by the last committed metadata.

SHOW TABLES and DESCRIBE work (schema metadata is intact), but any SELECT — including COUNT(*) — fails with 50408: Corrupted Catalog. Note: COUNT(*) is a supported R2 SQL aggregation; this is not an unsupported-SQL error.

I understand that stream.send() resolves when records are confirmed as ingested into the stream, not when they are committed to the sink/catalog. The bug is that sink-side metadata commits silently stopped advancing, while the Pipeline continued generating Parquet data files and Avro manifest-lists. Compaction then deleted the old data files that the stale metadata still referenced, bricking the table.

Per the Pipelines documentation: "Sinks provide exactly-once delivery guarantees." The observed behavior — orphaned manifests, stale metadata, and compaction deleting referenced files — is the opposite of that guarantee.

Environment

I can provide Account ID and resource IDs to Cloudflare staff privately if needed.

Resource Value
Sink format Parquet (zstd compression)
Roll interval 300 seconds (default)
Schema Fixed (no schema evolution)

Maintenance & Lifecycle Configuration

Setting Value
Compaction Enabled (target_size_mb: 128)
Snapshot expiration Disabled (min_snapshots_to_keep: 100, max_snapshot_age: 7d)
R2 bucket lifecycle rules None (only default multipart abort rule)
Manual R2 object deletion None — I never manually deleted any __r2_data_catalog/ objects

Compaction being enabled is central to the failure: it deletes old small Parquet files after merging them into larger ones. When compaction ran but the metadata pointer had already stalled, the referenced data files were removed while the metadata still pointed at them.

What works

SHOW TABLES IN <namespace>
-- Returns table list correctly

DESCRIBE <namespace>.<table>
-- Returns full schema correctly

What fails

SELECT COUNT(*) FROM <namespace>.<table>
-- ERROR: 50408: Corrupted Catalog

SELECT * FROM <namespace>.<table> LIMIT 1
-- ERROR: 50408: Corrupted Catalog

Observed Iceberg State

I inspected the raw R2 objects under __r2_data_catalog/ and found the following:

Phase 1 — Working (Jan 28-29, 2026)

The Pipeline wrote 20 Iceberg metadata JSON files (00000 through 00019), each containing an incrementally growing snapshot list. The last metadata commit was at 2026-01-29T03:20:28Z (sequence number 19, 19 snapshots). Each snapshot references a manifest-list .avro file. All working correctly.

Phase 2 — Metadata commits stop, data continues (Feb 1-4)

The Pipeline continued writing 20 new manifest-list .avro files to the metadata/ directory (timestamps 2026-02-01T13:35Z through 2026-02-04T16:22Z), but no new metadata JSON files were committed after 00019. These orphaned manifest files are not referenced by any metadata snapshot.

Zero overlap between metadata references and on-disk manifests

The current metadata (00019) references 19 manifest-list avro files (from Jan 28-29). I checked which of these exist in R2:

Category Count Status
Manifest-list files referenced by metadata 00019 19 ALL 19 are MISSING from R2
Manifest-list files on disk in R2 20 ALL 20 are ORPHANED (not referenced by any metadata)
Overlap 0 Referenced set and on-disk set are completely disjoint

The metadata points to 19 manifest-list files that no longer exist. They were likely deleted by compaction. Meanwhile, 20 newer manifest-list files sit in R2 but no metadata JSON references them.

The data/manifest generation phase succeeded but the metadata commit phase stalled, and compaction cleaned up the files that the stale metadata referenced.

Timeline

Jan 28 22:23 UTC  — First metadata commit (00000, table created)
Jan 29 03:20 UTC  — Last metadata commit (00019, sequence 19)
                     ~~~ 82 hour gap — no metadata commits ~~~
Feb 01 13:35 UTC  — First orphaned manifest-list avro (no metadata points to it)
Feb 04 16:22 UTC  — Last orphaned manifest-list avro (Pipeline stops writing entirely)

R2 File Inventory

__r2_data_catalog/.../<table-uuid>/
  metadata/
    00000-...gz.metadata.json   (830 B)    -- Jan 28 (first)
    00001-...gz.metadata.json   (1119 B)
    ...
    00019-...gz.metadata.json   (3443 B)   -- Jan 29 (LAST committed metadata)
    snap-...-....avro           (2335 B)   -- Feb 01 (orphan, not in any metadata)
    snap-...-....avro           (2339 B)
    ...
    snap-...-....avro           (4459 B)   -- Feb 04 (orphan)
  data/
    *.parquet                              -- 20 files, 7.4 MB total
  • 20 metadata JSON files (Jan 28-29, sequentially numbered, all valid Iceberg v2 format)
  • 19 manifest-list avro files referenced by metadataall missing from R2 (deleted by compaction)
  • 20 manifest-list avro files on diskall orphaned (written Feb 1-4, not referenced by any metadata)
  • 20 Parquet data files (7.4 MB total, still present)

Workload That Triggered the Issue

Write pattern

A Cloudflare Worker with multiple cron triggers calls stream.send() on the same Pipeline stream:

Cron Schedule What it does
*/5 * * * * Every 5 min Drains outbox → stream.send(rows)
*/15 * * * * Every 15 min Same outbox → stream.send(rows)
*/30 * * * * Every 30 min Same outbox → stream.send(rows)

At the 15-minute mark, two triggers fire simultaneously. At the 30-minute mark, all three fire simultaneously — resulting in concurrent stream.send() calls within a single 300s roll interval.

Batch sizes and throughput (well within documented limits)

Metric Value
Typical row size ~1.2 KB JSON object
Batch size per stream.send() 200 rows (~243 KB)
Max stream.send() calls per cron invocation 3-10 (depending on outbox depth)
Peak concurrent writers 3 (cron triggers)
Peak aggregate throughput ~0.08 MB/s
Documented stream ingest limit 1 MB/request, 5 MB/s/stream

I was at <2% of the documented throughput limit. The issue is not volume — it's concurrency of stream.send() callers within a single roll interval.

Hypotheses

Most likely: Concurrent stream.send() calls caused repeated Iceberg metadata commit conflicts.

Iceberg uses optimistic concurrency for metadata commits: read current metadata → write new metadata → atomic swap. When multiple manifest-lists are generated concurrently within one roll interval, the commit coordinator must serialize them. If two commits race and one fails, the failed commit's manifest-list becomes an orphan.

If the commit coordinator entered a permanent failure loop (e.g., always conflicting because it retries against stale state), metadata would stop advancing while the data/manifest generation phase continues unaware. Eventually compaction runs, sees the old data files are no longer needed (from its perspective), and deletes them — but the stale metadata still points to them.

Alternative: Compaction ran before metadata advanced (ordering bug).

Compaction should only delete files that are no longer referenced by the current metadata. But if the compaction process reads a different metadata pointer than R2 SQL does, or if there's a race between compaction advancing the metadata and the main commit path, this could also explain the divergence.

Recovery

This table cannot be recovered through normal means:

  • The metadata points to deleted manifest-list files → R2 SQL cannot read any data
  • The data is still in R2 (Parquet files + orphaned manifest-lists) but unreachable through the catalog
  • Per Cloudflare docs: "Sinks cannot be created for existing Iceberg tables" — so I cannot recreate the sink on the same table name as a workaround
  • To restore service, I would need to create an entirely new table with a new name and backfill all data from my source database

Steps to Reproduce

  1. Create a Pipeline with an R2 Data Catalog sink (Parquet format, default 300s roll interval)
  2. Enable compaction on the catalog (default setting)
  3. Set up a Worker with multiple cron triggers (e.g., */5 * * * *, */15 * * * *, */30 * * * *)
  4. Have each cron trigger call stream.send(rows) with batches of ~200 rows (~243 KB per call)
  5. At the 15-minute and 30-minute boundaries, multiple cron triggers will fire concurrently, calling stream.send() in parallel on the same stream
  6. After hours to days: inspect R2 objects under __r2_data_catalog/ — metadata JSON files stop being committed while manifest-list avros and Parquet files continue to accumulate
  7. Query the table with R2 SQL → 50408: Corrupted Catalog

Expected Behavior

  1. Sink metadata commits must not silently stall. If a metadata commit fails (e.g., due to an optimistic concurrency conflict), the Pipeline should retry or re-queue the commit — not permanently stop advancing metadata.
  2. Compaction must not delete files referenced by the current metadata. If the metadata commit is stalled, compaction should either be blocked or should not delete files that the last committed metadata still references.
  3. Commit failures must be observable. An error metric, log entry, or alert should be emitted when metadata commits fail, so users can detect the problem before data accumulates in orphaned manifests. The pipelinesUserErrorsAdaptiveGroups GraphQL dataset shows zero errors for my pipeline during the failure window — the failure was completely silent.
  4. Table recovery should be possible. Given that the data and newer manifests still exist in R2, it should be possible to repair the catalog by pointing metadata at the latest valid manifest-list, or by providing a "rebuild catalog from manifest files" API.

Questions for the Cloudflare Team

  1. Can you repair this catalog? The orphaned manifest-list avros and Parquet data files are still in R2. The data is recoverable if metadata is re-pointed.
  2. Is concurrent stream.send() from multiple cron triggers on the same Worker a supported pattern? If not, this constraint should be documented — it's a natural pattern for Workers with multiple schedules.
  3. Why did the pipelinesUserErrorsAdaptiveGroups metric show zero errors? The metadata commit failures were completely invisible. Are commit-level failures surfaced in any observable metric or log?

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions