What versions & operating system are you using?
- wrangler 4.59.2
- Windows 11 / Node.js v22
- Pipeline created via
wrangler pipelines CLI
Please provide a link to a minimal reproduction
No minimal repro repository — the issue requires sustained concurrent stream.send() calls over time. Full reproduction steps with a new pipeline are below.
Describe the Bug
R2 Data Catalog Iceberg table becomes permanently unqueryable (50408: Corrupted Catalog) after the Pipeline's sink-side metadata commit process stalls while compaction deletes the data files referenced by the last committed metadata.
SHOW TABLES and DESCRIBE work (schema metadata is intact), but any SELECT — including COUNT(*) — fails with 50408: Corrupted Catalog. Note: COUNT(*) is a supported R2 SQL aggregation; this is not an unsupported-SQL error.
I understand that stream.send() resolves when records are confirmed as ingested into the stream, not when they are committed to the sink/catalog. The bug is that sink-side metadata commits silently stopped advancing, while the Pipeline continued generating Parquet data files and Avro manifest-lists. Compaction then deleted the old data files that the stale metadata still referenced, bricking the table.
Per the Pipelines documentation: "Sinks provide exactly-once delivery guarantees." The observed behavior — orphaned manifests, stale metadata, and compaction deleting referenced files — is the opposite of that guarantee.
Environment
I can provide Account ID and resource IDs to Cloudflare staff privately if needed.
| Resource |
Value |
| Sink format |
Parquet (zstd compression) |
| Roll interval |
300 seconds (default) |
| Schema |
Fixed (no schema evolution) |
Maintenance & Lifecycle Configuration
| Setting |
Value |
| Compaction |
Enabled (target_size_mb: 128) |
| Snapshot expiration |
Disabled (min_snapshots_to_keep: 100, max_snapshot_age: 7d) |
| R2 bucket lifecycle rules |
None (only default multipart abort rule) |
| Manual R2 object deletion |
None — I never manually deleted any __r2_data_catalog/ objects |
Compaction being enabled is central to the failure: it deletes old small Parquet files after merging them into larger ones. When compaction ran but the metadata pointer had already stalled, the referenced data files were removed while the metadata still pointed at them.
What works
SHOW TABLES IN <namespace>
-- Returns table list correctly
DESCRIBE <namespace>.<table>
-- Returns full schema correctly
What fails
SELECT COUNT(*) FROM <namespace>.<table>
-- ERROR: 50408: Corrupted Catalog
SELECT * FROM <namespace>.<table> LIMIT 1
-- ERROR: 50408: Corrupted Catalog
Observed Iceberg State
I inspected the raw R2 objects under __r2_data_catalog/ and found the following:
Phase 1 — Working (Jan 28-29, 2026)
The Pipeline wrote 20 Iceberg metadata JSON files (00000 through 00019), each containing an incrementally growing snapshot list. The last metadata commit was at 2026-01-29T03:20:28Z (sequence number 19, 19 snapshots). Each snapshot references a manifest-list .avro file. All working correctly.
Phase 2 — Metadata commits stop, data continues (Feb 1-4)
The Pipeline continued writing 20 new manifest-list .avro files to the metadata/ directory (timestamps 2026-02-01T13:35Z through 2026-02-04T16:22Z), but no new metadata JSON files were committed after 00019. These orphaned manifest files are not referenced by any metadata snapshot.
Zero overlap between metadata references and on-disk manifests
The current metadata (00019) references 19 manifest-list avro files (from Jan 28-29). I checked which of these exist in R2:
| Category |
Count |
Status |
| Manifest-list files referenced by metadata 00019 |
19 |
ALL 19 are MISSING from R2 |
| Manifest-list files on disk in R2 |
20 |
ALL 20 are ORPHANED (not referenced by any metadata) |
| Overlap |
0 |
Referenced set and on-disk set are completely disjoint |
The metadata points to 19 manifest-list files that no longer exist. They were likely deleted by compaction. Meanwhile, 20 newer manifest-list files sit in R2 but no metadata JSON references them.
The data/manifest generation phase succeeded but the metadata commit phase stalled, and compaction cleaned up the files that the stale metadata referenced.
Timeline
Jan 28 22:23 UTC — First metadata commit (00000, table created)
Jan 29 03:20 UTC — Last metadata commit (00019, sequence 19)
~~~ 82 hour gap — no metadata commits ~~~
Feb 01 13:35 UTC — First orphaned manifest-list avro (no metadata points to it)
Feb 04 16:22 UTC — Last orphaned manifest-list avro (Pipeline stops writing entirely)
R2 File Inventory
__r2_data_catalog/.../<table-uuid>/
metadata/
00000-...gz.metadata.json (830 B) -- Jan 28 (first)
00001-...gz.metadata.json (1119 B)
...
00019-...gz.metadata.json (3443 B) -- Jan 29 (LAST committed metadata)
snap-...-....avro (2335 B) -- Feb 01 (orphan, not in any metadata)
snap-...-....avro (2339 B)
...
snap-...-....avro (4459 B) -- Feb 04 (orphan)
data/
*.parquet -- 20 files, 7.4 MB total
- 20 metadata JSON files (Jan 28-29, sequentially numbered, all valid Iceberg v2 format)
- 19 manifest-list avro files referenced by metadata — all missing from R2 (deleted by compaction)
- 20 manifest-list avro files on disk — all orphaned (written Feb 1-4, not referenced by any metadata)
- 20 Parquet data files (7.4 MB total, still present)
Workload That Triggered the Issue
Write pattern
A Cloudflare Worker with multiple cron triggers calls stream.send() on the same Pipeline stream:
| Cron |
Schedule |
What it does |
*/5 * * * * |
Every 5 min |
Drains outbox → stream.send(rows) |
*/15 * * * * |
Every 15 min |
Same outbox → stream.send(rows) |
*/30 * * * * |
Every 30 min |
Same outbox → stream.send(rows) |
At the 15-minute mark, two triggers fire simultaneously. At the 30-minute mark, all three fire simultaneously — resulting in concurrent stream.send() calls within a single 300s roll interval.
Batch sizes and throughput (well within documented limits)
| Metric |
Value |
| Typical row size |
~1.2 KB JSON object |
Batch size per stream.send() |
200 rows (~243 KB) |
Max stream.send() calls per cron invocation |
3-10 (depending on outbox depth) |
| Peak concurrent writers |
3 (cron triggers) |
| Peak aggregate throughput |
~0.08 MB/s |
| Documented stream ingest limit |
1 MB/request, 5 MB/s/stream |
I was at <2% of the documented throughput limit. The issue is not volume — it's concurrency of stream.send() callers within a single roll interval.
Hypotheses
Most likely: Concurrent stream.send() calls caused repeated Iceberg metadata commit conflicts.
Iceberg uses optimistic concurrency for metadata commits: read current metadata → write new metadata → atomic swap. When multiple manifest-lists are generated concurrently within one roll interval, the commit coordinator must serialize them. If two commits race and one fails, the failed commit's manifest-list becomes an orphan.
If the commit coordinator entered a permanent failure loop (e.g., always conflicting because it retries against stale state), metadata would stop advancing while the data/manifest generation phase continues unaware. Eventually compaction runs, sees the old data files are no longer needed (from its perspective), and deletes them — but the stale metadata still points to them.
Alternative: Compaction ran before metadata advanced (ordering bug).
Compaction should only delete files that are no longer referenced by the current metadata. But if the compaction process reads a different metadata pointer than R2 SQL does, or if there's a race between compaction advancing the metadata and the main commit path, this could also explain the divergence.
Recovery
This table cannot be recovered through normal means:
- The metadata points to deleted manifest-list files → R2 SQL cannot read any data
- The data is still in R2 (Parquet files + orphaned manifest-lists) but unreachable through the catalog
- Per Cloudflare docs: "Sinks cannot be created for existing Iceberg tables" — so I cannot recreate the sink on the same table name as a workaround
- To restore service, I would need to create an entirely new table with a new name and backfill all data from my source database
Steps to Reproduce
- Create a Pipeline with an R2 Data Catalog sink (Parquet format, default 300s roll interval)
- Enable compaction on the catalog (default setting)
- Set up a Worker with multiple cron triggers (e.g.,
*/5 * * * *, */15 * * * *, */30 * * * *)
- Have each cron trigger call
stream.send(rows) with batches of ~200 rows (~243 KB per call)
- At the 15-minute and 30-minute boundaries, multiple cron triggers will fire concurrently, calling
stream.send() in parallel on the same stream
- After hours to days: inspect R2 objects under
__r2_data_catalog/ — metadata JSON files stop being committed while manifest-list avros and Parquet files continue to accumulate
- Query the table with R2 SQL →
50408: Corrupted Catalog
Expected Behavior
- Sink metadata commits must not silently stall. If a metadata commit fails (e.g., due to an optimistic concurrency conflict), the Pipeline should retry or re-queue the commit — not permanently stop advancing metadata.
- Compaction must not delete files referenced by the current metadata. If the metadata commit is stalled, compaction should either be blocked or should not delete files that the last committed metadata still references.
- Commit failures must be observable. An error metric, log entry, or alert should be emitted when metadata commits fail, so users can detect the problem before data accumulates in orphaned manifests. The
pipelinesUserErrorsAdaptiveGroups GraphQL dataset shows zero errors for my pipeline during the failure window — the failure was completely silent.
- Table recovery should be possible. Given that the data and newer manifests still exist in R2, it should be possible to repair the catalog by pointing metadata at the latest valid manifest-list, or by providing a "rebuild catalog from manifest files" API.
Questions for the Cloudflare Team
- Can you repair this catalog? The orphaned manifest-list avros and Parquet data files are still in R2. The data is recoverable if metadata is re-pointed.
- Is concurrent
stream.send() from multiple cron triggers on the same Worker a supported pattern? If not, this constraint should be documented — it's a natural pattern for Workers with multiple schedules.
- Why did the
pipelinesUserErrorsAdaptiveGroups metric show zero errors? The metadata commit failures were completely invisible. Are commit-level failures surfaced in any observable metric or log?
What versions & operating system are you using?
wrangler pipelinesCLIPlease provide a link to a minimal reproduction
No minimal repro repository — the issue requires sustained concurrent
stream.send()calls over time. Full reproduction steps with a new pipeline are below.Describe the Bug
R2 Data Catalog Iceberg table becomes permanently unqueryable (
50408: Corrupted Catalog) after the Pipeline's sink-side metadata commit process stalls while compaction deletes the data files referenced by the last committed metadata.SHOW TABLESandDESCRIBEwork (schema metadata is intact), but anySELECT— includingCOUNT(*)— fails with50408: Corrupted Catalog. Note:COUNT(*)is a supported R2 SQL aggregation; this is not an unsupported-SQL error.I understand that
stream.send()resolves when records are confirmed as ingested into the stream, not when they are committed to the sink/catalog. The bug is that sink-side metadata commits silently stopped advancing, while the Pipeline continued generating Parquet data files and Avro manifest-lists. Compaction then deleted the old data files that the stale metadata still referenced, bricking the table.Per the Pipelines documentation: "Sinks provide exactly-once delivery guarantees." The observed behavior — orphaned manifests, stale metadata, and compaction deleting referenced files — is the opposite of that guarantee.
Environment
I can provide Account ID and resource IDs to Cloudflare staff privately if needed.
Maintenance & Lifecycle Configuration
__r2_data_catalog/objectsCompaction being enabled is central to the failure: it deletes old small Parquet files after merging them into larger ones. When compaction ran but the metadata pointer had already stalled, the referenced data files were removed while the metadata still pointed at them.
What works
What fails
Observed Iceberg State
I inspected the raw R2 objects under
__r2_data_catalog/and found the following:Phase 1 — Working (Jan 28-29, 2026)
The Pipeline wrote 20 Iceberg metadata JSON files (
00000through00019), each containing an incrementally growing snapshot list. The last metadata commit was at2026-01-29T03:20:28Z(sequence number 19, 19 snapshots). Each snapshot references a manifest-list.avrofile. All working correctly.Phase 2 — Metadata commits stop, data continues (Feb 1-4)
The Pipeline continued writing 20 new manifest-list
.avrofiles to themetadata/directory (timestamps2026-02-01T13:35Zthrough2026-02-04T16:22Z), but no new metadata JSON files were committed after00019. These orphaned manifest files are not referenced by any metadata snapshot.Zero overlap between metadata references and on-disk manifests
The current metadata (
00019) references 19 manifest-list avro files (from Jan 28-29). I checked which of these exist in R2:The metadata points to 19 manifest-list files that no longer exist. They were likely deleted by compaction. Meanwhile, 20 newer manifest-list files sit in R2 but no metadata JSON references them.
The data/manifest generation phase succeeded but the metadata commit phase stalled, and compaction cleaned up the files that the stale metadata referenced.
Timeline
R2 File Inventory
Workload That Triggered the Issue
Write pattern
A Cloudflare Worker with multiple cron triggers calls
stream.send()on the same Pipeline stream:*/5 * * * *stream.send(rows)*/15 * * * *stream.send(rows)*/30 * * * *stream.send(rows)At the 15-minute mark, two triggers fire simultaneously. At the 30-minute mark, all three fire simultaneously — resulting in concurrent
stream.send()calls within a single 300s roll interval.Batch sizes and throughput (well within documented limits)
stream.send()stream.send()calls per cron invocationI was at <2% of the documented throughput limit. The issue is not volume — it's concurrency of
stream.send()callers within a single roll interval.Hypotheses
Most likely: Concurrent
stream.send()calls caused repeated Iceberg metadata commit conflicts.Iceberg uses optimistic concurrency for metadata commits: read current metadata → write new metadata → atomic swap. When multiple manifest-lists are generated concurrently within one roll interval, the commit coordinator must serialize them. If two commits race and one fails, the failed commit's manifest-list becomes an orphan.
If the commit coordinator entered a permanent failure loop (e.g., always conflicting because it retries against stale state), metadata would stop advancing while the data/manifest generation phase continues unaware. Eventually compaction runs, sees the old data files are no longer needed (from its perspective), and deletes them — but the stale metadata still points to them.
Alternative: Compaction ran before metadata advanced (ordering bug).
Compaction should only delete files that are no longer referenced by the current metadata. But if the compaction process reads a different metadata pointer than R2 SQL does, or if there's a race between compaction advancing the metadata and the main commit path, this could also explain the divergence.
Recovery
This table cannot be recovered through normal means:
Steps to Reproduce
*/5 * * * *,*/15 * * * *,*/30 * * * *)stream.send(rows)with batches of ~200 rows (~243 KB per call)stream.send()in parallel on the same stream__r2_data_catalog/— metadata JSON files stop being committed while manifest-list avros and Parquet files continue to accumulate50408: Corrupted CatalogExpected Behavior
pipelinesUserErrorsAdaptiveGroupsGraphQL dataset shows zero errors for my pipeline during the failure window — the failure was completely silent.Questions for the Cloudflare Team
stream.send()from multiple cron triggers on the same Worker a supported pattern? If not, this constraint should be documented — it's a natural pattern for Workers with multiple schedules.pipelinesUserErrorsAdaptiveGroupsmetric show zero errors? The metadata commit failures were completely invisible. Are commit-level failures surfaced in any observable metric or log?