R2 Data Catalog Iceberg table corrupted (50408) after Pipeline sink metadata commits silently stall

### What versions & operating system are you using?

- wrangler 4.59.2
- Windows 11 / Node.js v22
- Pipeline created via `wrangler pipelines` CLI

### Please provide a link to a minimal reproduction

No minimal repro repository — the issue requires sustained concurrent `stream.send()` calls over time. Full reproduction steps with a new pipeline are below.

### Describe the Bug

R2 Data Catalog Iceberg table becomes permanently unqueryable (`50408: Corrupted Catalog`) after the Pipeline's sink-side metadata commit process stalls while compaction deletes the data files referenced by the last committed metadata.

`SHOW TABLES` and `DESCRIBE` work (schema metadata is intact), but any `SELECT` — including `COUNT(*)` — fails with `50408: Corrupted Catalog`. Note: `COUNT(*)` is a supported R2 SQL aggregation; this is not an unsupported-SQL error.

I understand that `stream.send()` resolves when records are confirmed as ingested into the stream, not when they are committed to the sink/catalog. The bug is that **sink-side metadata commits silently stopped advancing**, while the Pipeline continued generating Parquet data files and Avro manifest-lists. Compaction then deleted the old data files that the stale metadata still referenced, bricking the table.

Per the Pipelines documentation: *"Sinks provide exactly-once delivery guarantees."* The observed behavior — orphaned manifests, stale metadata, and compaction deleting referenced files — is the opposite of that guarantee.

### Environment

*I can provide Account ID and resource IDs to Cloudflare staff privately if needed.*

| Resource | Value |
|----------|-------|
| Sink format | Parquet (zstd compression) |
| Roll interval | 300 seconds (default) |
| Schema | Fixed (no schema evolution) |

### Maintenance & Lifecycle Configuration

| Setting | Value |
|---------|-------|
| Compaction | **Enabled** (target_size_mb: 128) |
| Snapshot expiration | Disabled (min_snapshots_to_keep: 100, max_snapshot_age: 7d) |
| R2 bucket lifecycle rules | None (only default multipart abort rule) |
| Manual R2 object deletion | None — I never manually deleted any `__r2_data_catalog/` objects |

Compaction being enabled is central to the failure: it deletes old small Parquet files after merging them into larger ones. When compaction ran but the metadata pointer had already stalled, the referenced data files were removed while the metadata still pointed at them.

### What works

```sql
SHOW TABLES IN <namespace>
-- Returns table list correctly

DESCRIBE <namespace>.<table>
-- Returns full schema correctly
```

### What fails

```sql
SELECT COUNT(*) FROM <namespace>.<table>
-- ERROR: 50408: Corrupted Catalog

SELECT * FROM <namespace>.<table> LIMIT 1
-- ERROR: 50408: Corrupted Catalog
```

### Observed Iceberg State

I inspected the raw R2 objects under `__r2_data_catalog/` and found the following:

#### Phase 1 — Working (Jan 28-29, 2026)

The Pipeline wrote 20 Iceberg metadata JSON files (`00000` through `00019`), each containing an incrementally growing snapshot list. The last metadata commit was at `2026-01-29T03:20:28Z` (sequence number 19, 19 snapshots). Each snapshot references a manifest-list `.avro` file. All working correctly.

#### Phase 2 — Metadata commits stop, data continues (Feb 1-4)

The Pipeline continued writing **20 new manifest-list `.avro` files** to the `metadata/` directory (timestamps `2026-02-01T13:35Z` through `2026-02-04T16:22Z`), but **no new metadata JSON files** were committed after `00019`. These orphaned manifest files are not referenced by any metadata snapshot.

#### Zero overlap between metadata references and on-disk manifests

The current metadata (`00019`) references **19 manifest-list avro files** (from Jan 28-29). I checked which of these exist in R2:

| Category | Count | Status |
|----------|-------|--------|
| Manifest-list files referenced by metadata 00019 | 19 | **ALL 19 are MISSING from R2** |
| Manifest-list files on disk in R2 | 20 | **ALL 20 are ORPHANED (not referenced by any metadata)** |
| **Overlap** | **0** | Referenced set and on-disk set are completely disjoint |

**The metadata points to 19 manifest-list files that no longer exist.** They were likely deleted by compaction. Meanwhile, 20 newer manifest-list files sit in R2 but no metadata JSON references them.

The data/manifest generation phase succeeded but the metadata commit phase stalled, and compaction cleaned up the files that the stale metadata referenced.

#### Timeline

```
Jan 28 22:23 UTC  — First metadata commit (00000, table created)
Jan 29 03:20 UTC  — Last metadata commit (00019, sequence 19)
                     ~~~ 82 hour gap — no metadata commits ~~~
Feb 01 13:35 UTC  — First orphaned manifest-list avro (no metadata points to it)
Feb 04 16:22 UTC  — Last orphaned manifest-list avro (Pipeline stops writing entirely)
```

### R2 File Inventory

```
__r2_data_catalog/.../<table-uuid>/
  metadata/
    00000-...gz.metadata.json   (830 B)    -- Jan 28 (first)
    00001-...gz.metadata.json   (1119 B)
    ...
    00019-...gz.metadata.json   (3443 B)   -- Jan 29 (LAST committed metadata)
    snap-...-....avro           (2335 B)   -- Feb 01 (orphan, not in any metadata)
    snap-...-....avro           (2339 B)
    ...
    snap-...-....avro           (4459 B)   -- Feb 04 (orphan)
  data/
    *.parquet                              -- 20 files, 7.4 MB total
```

- **20 metadata JSON files** (Jan 28-29, sequentially numbered, all valid Iceberg v2 format)
- **19 manifest-list avro files referenced by metadata** — **all missing from R2** (deleted by compaction)
- **20 manifest-list avro files on disk** — **all orphaned** (written Feb 1-4, not referenced by any metadata)
- **20 Parquet data files** (7.4 MB total, still present)

### Workload That Triggered the Issue

#### Write pattern

A Cloudflare Worker with multiple cron triggers calls `stream.send()` on the same Pipeline stream:

| Cron | Schedule | What it does |
|------|----------|--------------|
| `*/5 * * * *` | Every 5 min | Drains outbox → `stream.send(rows)` |
| `*/15 * * * *` | Every 15 min | Same outbox → `stream.send(rows)` |
| `*/30 * * * *` | Every 30 min | Same outbox → `stream.send(rows)` |

At the 15-minute mark, two triggers fire simultaneously. At the 30-minute mark, all three fire simultaneously — resulting in concurrent `stream.send()` calls within a single 300s roll interval.

#### Batch sizes and throughput (well within documented limits)

| Metric | Value |
|--------|-------|
| Typical row size | ~1.2 KB JSON object |
| Batch size per `stream.send()` | 200 rows (~243 KB) |
| Max `stream.send()` calls per cron invocation | 3-10 (depending on outbox depth) |
| Peak concurrent writers | 3 (cron triggers) |
| Peak aggregate throughput | ~0.08 MB/s |
| Documented stream ingest limit | 1 MB/request, 5 MB/s/stream |

**I was at <2% of the documented throughput limit.** The issue is not volume — it's concurrency of `stream.send()` callers within a single roll interval.

### Hypotheses

**Most likely: Concurrent `stream.send()` calls caused repeated Iceberg metadata commit conflicts.**

Iceberg uses optimistic concurrency for metadata commits: read current metadata → write new metadata → atomic swap. When multiple manifest-lists are generated concurrently within one roll interval, the commit coordinator must serialize them. If two commits race and one fails, the failed commit's manifest-list becomes an orphan.

If the commit coordinator entered a permanent failure loop (e.g., always conflicting because it retries against stale state), metadata would stop advancing while the data/manifest generation phase continues unaware. Eventually compaction runs, sees the old data files are no longer needed (from its perspective), and deletes them — but the stale metadata still points to them.

**Alternative: Compaction ran before metadata advanced (ordering bug).**

Compaction should only delete files that are no longer referenced by the *current* metadata. But if the compaction process reads a different metadata pointer than R2 SQL does, or if there's a race between compaction advancing the metadata and the main commit path, this could also explain the divergence.

### Recovery

This table cannot be recovered through normal means:
- The metadata points to deleted manifest-list files → R2 SQL cannot read any data
- The data is still in R2 (Parquet files + orphaned manifest-lists) but unreachable through the catalog
- Per Cloudflare docs: *"Sinks cannot be created for existing Iceberg tables"* — so I cannot recreate the sink on the same table name as a workaround
- To restore service, I would need to create an entirely new table with a new name and backfill all data from my source database

### Steps to Reproduce

1. Create a Pipeline with an R2 Data Catalog sink (Parquet format, default 300s roll interval)
2. Enable compaction on the catalog (default setting)
3. Set up a Worker with multiple cron triggers (e.g., `*/5 * * * *`, `*/15 * * * *`, `*/30 * * * *`)
4. Have each cron trigger call `stream.send(rows)` with batches of ~200 rows (~243 KB per call)
5. At the 15-minute and 30-minute boundaries, multiple cron triggers will fire concurrently, calling `stream.send()` in parallel on the same stream
6. After hours to days: inspect R2 objects under `__r2_data_catalog/` — metadata JSON files stop being committed while manifest-list avros and Parquet files continue to accumulate
7. Query the table with R2 SQL → `50408: Corrupted Catalog`

### Expected Behavior

1. **Sink metadata commits must not silently stall.** If a metadata commit fails (e.g., due to an optimistic concurrency conflict), the Pipeline should retry or re-queue the commit — not permanently stop advancing metadata.
2. **Compaction must not delete files referenced by the current metadata.** If the metadata commit is stalled, compaction should either be blocked or should not delete files that the last committed metadata still references.
3. **Commit failures must be observable.** An error metric, log entry, or alert should be emitted when metadata commits fail, so users can detect the problem before data accumulates in orphaned manifests. The `pipelinesUserErrorsAdaptiveGroups` GraphQL dataset shows zero errors for my pipeline during the failure window — the failure was completely silent.
4. **Table recovery should be possible.** Given that the data and newer manifests still exist in R2, it should be possible to repair the catalog by pointing metadata at the latest valid manifest-list, or by providing a "rebuild catalog from manifest files" API.

### Questions for the Cloudflare Team

1. **Can you repair this catalog?** The orphaned manifest-list avros and Parquet data files are still in R2. The data is recoverable if metadata is re-pointed.
2. **Is concurrent `stream.send()` from multiple cron triggers on the same Worker a supported pattern?** If not, this constraint should be documented — it's a natural pattern for Workers with multiple schedules.
3. **Why did the `pipelinesUserErrorsAdaptiveGroups` metric show zero errors?** The metadata commit failures were completely invisible. Are commit-level failures surfaced in any observable metric or log?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R2 Data Catalog Iceberg table corrupted (50408) after Pipeline sink metadata commits silently stall #12774

What versions & operating system are you using?

Please provide a link to a minimal reproduction

Describe the Bug

Environment

Maintenance & Lifecycle Configuration

What works

What fails

Observed Iceberg State

Phase 1 — Working (Jan 28-29, 2026)

Phase 2 — Metadata commits stop, data continues (Feb 1-4)

Zero overlap between metadata references and on-disk manifests

Timeline

R2 File Inventory

Workload That Triggered the Issue

Write pattern

Batch sizes and throughput (well within documented limits)

Hypotheses

Recovery

Steps to Reproduce

Expected Behavior

Questions for the Cloudflare Team

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Resource	Value
Sink format	Parquet (zstd compression)
Roll interval	300 seconds (default)
Schema	Fixed (no schema evolution)

Setting	Value
Compaction	Enabled (target_size_mb: 128)
Snapshot expiration	Disabled (min_snapshots_to_keep: 100, max_snapshot_age: 7d)
R2 bucket lifecycle rules	None (only default multipart abort rule)
Manual R2 object deletion	None — I never manually deleted any `__r2_data_catalog/` objects

Category	Count	Status
Manifest-list files referenced by metadata 00019	19	ALL 19 are MISSING from R2
Manifest-list files on disk in R2	20	ALL 20 are ORPHANED (not referenced by any metadata)
Overlap	0	Referenced set and on-disk set are completely disjoint

Cron	Schedule	What it does
`/5 * * *`	Every 5 min	Drains outbox → `stream.send(rows)`
`/15 * * *`	Every 15 min	Same outbox → `stream.send(rows)`
`/30 * * *`	Every 30 min	Same outbox → `stream.send(rows)`

Metric	Value
Typical row size	~1.2 KB JSON object
Batch size per `stream.send()`	200 rows (~243 KB)
Max `stream.send()` calls per cron invocation	3-10 (depending on outbox depth)
Peak concurrent writers	3 (cron triggers)
Peak aggregate throughput	~0.08 MB/s
Documented stream ingest limit	1 MB/request, 5 MB/s/stream

R2 Data Catalog Iceberg table corrupted (50408) after Pipeline sink metadata commits silently stall #12774

Description

What versions & operating system are you using?

Please provide a link to a minimal reproduction

Describe the Bug

Environment

Maintenance & Lifecycle Configuration

What works

What fails

Observed Iceberg State

Phase 1 — Working (Jan 28-29, 2026)

Phase 2 — Metadata commits stop, data continues (Feb 1-4)

Zero overlap between metadata references and on-disk manifests

Timeline

R2 File Inventory

Workload That Triggered the Issue

Write pattern

Batch sizes and throughput (well within documented limits)

Hypotheses

Recovery

Steps to Reproduce

Expected Behavior

Questions for the Cloudflare Team

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions