Skip to content

[Pipelines] Add Bluesky firehose fan-out example and document multi-statement SQL#31698

Open
Marcinthecloud wants to merge 5 commits into
productionfrom
pipelines-demux-example
Open

[Pipelines] Add Bluesky firehose fan-out example and document multi-statement SQL#31698
Marcinthecloud wants to merge 5 commits into
productionfrom
pipelines-demux-example

Conversation

@Marcinthecloud

@Marcinthecloud Marcinthecloud commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Documents multi-statement pipeline SQL (multiple ;-separated INSERT … SELECT statements in one pipeline), a supported but previously undocumented capability.
  • Adds a new /pipelines/examples/ section with a fan-out tutorial: consume the public Bluesky Jetstream firehose in a Durable Object, ingest to one stream, and route events into five R2 Data Catalog tables by type.
  • Updates select-statements.mdx and manage-pipelines.mdx to describe routing one stream to multiple tables.
  • Cross-links the Examples section from the Pipelines overview and getting-started pages.

Documentation checklist

  • Is there a changelog entry (guidelines)? If you don't add one for something awesome and new (however small) — how will our customers find out? Changelogs are automatically posted to RSS feeds, the Discord, and X.
  • The change adheres to the documentation style guide.
  • If a larger change - such as adding a new page- an issue has been opened in relation to any incorrect or out of date information that this PR fixes.

@github-actions

Copy link
Copy Markdown
Contributor

This pull request requires reviews from CODEOWNERS as it changes files that match the following patterns:

Pattern Owners
/src/content/docs/pipelines/ @Marcinthecloud, @cmackenzie1, @oliy, @garvit-gupta, @sejoker, @jonesphillip, @elithrar, @cloudflare/product-owners

@ask-bonk ask-bonk Bot added content:new Request for new/missing content triage Waiting for a docs team review. labels Jun 25, 2026

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues need fixing before merge: description too long, TypeScript example missing indentation, and a missing changelog entry.

MEDIUMbluesky-firehose-fanout.mdx line 5: description exceeds 160 characters. Shorten to 50–160 characters per the style guide.

MEDIUMbluesky-firehose-fanout.mdx lines 214–319: TypeScript example has missing/inconsistent indentation and mixed tabs and spaces. Format the entire block so it is safe to copy-paste.

MEDIUM — Missing changelog entry. The PR documents a new capability (multi-statement SQL) and adds a new example page. Add a changelog entry in src/content/changelog/pipelines/.

Posted 5 inline suggestions for the above plus minor style fixes.

Comment thread src/content/docs/pipelines/examples/bluesky-firehose-fanout.mdx Outdated
Comment thread src/content/docs/pipelines/examples/bluesky-firehose-fanout.mdx
Comment thread src/content/docs/pipelines/examples/bluesky-firehose-fanout.mdx
Comment thread src/content/docs/pipelines/examples/bluesky-firehose-fanout.mdx
Comment thread src/content/docs/pipelines/examples/bluesky-firehose-fanout.mdx
Comment thread src/content/docs/pipelines/pipelines/manage-pipelines.mdx Outdated
Comment thread src/content/docs/pipelines/sql-reference/select-statements.mdx Outdated
@ask-bonk

ask-bonk Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

I have completed the review of PR #31698.

Labels applied: content:new, triage

Summary of changes

  • Documents multi-statement pipeline SQL (semicolon-separated INSERT … SELECT statements in a single pipeline).
  • Adds a new /pipelines/examples/ section with a Bluesky Jetstream fan-out tutorial that routes one stream into five R2 Data Catalog Iceberg tables.
  • Updates manage-pipelines.mdx and select-statements.mdx with routing guidance and cross-links the new example.
  • Adds navigation links from the Pipelines overview and getting-started pages.

Issues flagged
I posted a GitHub review with 7 inline suggestions and flagged 3 items that need maintainer attention:

  • MEDIUMbluesky-firehose-fanout.mdx line 5: description exceeds 160 characters (style guide limit: 50–160 characters).
  • MEDIUMbluesky-firehose-fanout.mdx lines 214–319: The TypeScript example is missing indentation for top-level declarations (interface, constants, functions) and mixes tabs/spaces in the class body. This makes the example unsafe to copy-paste.
  • MEDIUM — Missing changelog entry. The PR introduces new functionality (multi-statement SQL) and a new page; per the PR's own checklist and repo conventions, a changelog entry in src/content/changelog/pipelines/ should be added.

Other suggestions posted

  • Removed unused Steps import in the new example.
  • Replaced two instances of "see" with "refer to" in manage-pipelines.mdx and select-statements.mdx per terminology rules.

github run

@Marcinthecloud

Copy link
Copy Markdown
Contributor Author

@ask-bonk can you re-review the latest please?

@Marcinthecloud Marcinthecloud marked this pull request as ready for review June 25, 2026 18:49
@cloudflare-docs-bot

cloudflare-docs-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review

⚠️ 2 warnings, 💡 3 suggestions found in commit e81dadb.

Code Review

This code review is in beta and may not always be helpful — use your judgment.

Warnings (2)
File Issue
pipelines/examples/bluesky-firehose-fanout.mdx line 183 Incorrect step reference — Step 5 tells the reader to replace <STREAM_ID> with the stream ID from step 3, but the stream is not created until step 4. Step 3 only creates the R2 bucket and enables the catalog. Fix: Change the reference from 'step 3' to 'step 4'.
pipelines/examples/bluesky-firehose-fanout.mdx line 306 Unserialized flush with shared stateonMessage fires void this.flush() without awaiting or locking. flush() mutates this.buf, this.lastFlush, and the persisted cursor while another flush may already be awaiting send(batch). Overlapping flushes can reorder batches, duplicate rows, or advance the stored cursor beyond a failed batch. Fix: Serialize flushes, for example by chaining them on a private promise or using an explicit lock, so only one flush runs at a time.

Style Guide Review

Suggestions (3)
File Issue
pipelines/examples/bluesky-firehose-fanout.mdx line 46 Bullet list with fewer than three items — Prerequisites section contains a bullet list with a single item ('An R2 API token...') Fix: Rewrite as prose instead of a one-item bullet list.
pipelines/examples/bluesky-firehose-fanout.mdx line 395 Avoid 'Learn more about' before a link — Line begins with 'To learn more about the SQL used here, refer to...' Fix: Use 'refer to SELECT statements and Manage pipelines' directly.
pipelines/sql-reference/select-statements.mdx line 19 see [link] should be refer to [link] — Line uses See [Multiple statements](#multiple-statements) Fix: Change to refer to [Multiple statements](#multiple-statements)
Commands

Only codeowners can run commands. Post a comment with the command to trigger it.

Command Description
/review Runs a review now. Incremental if a prior review exists, full if not.
/full-review Re-reviews the entire PR diff from scratch, ignoring incremental history. Useful after a rebase, when you want a fresh review, or if the bot gets out of sync and reports issues that no longer exist.
/ignore-review-limit Permanently lifts the 2-review automatic limit for this PR. Future pushes will trigger reviews as normal.

{ "name": "event_type", "type": "string", "required": true },
{ "name": "did", "type": "string", "required": false },
{ "name": "operation", "type": "string", "required": false },
{ "name": "time_us", "type": "int64", "required": false },

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a timestamp field instead of int64

WHERE event_type = 'view_product';
```

Each `INSERT` statement requires its own sink, and each sink writes to its own table. You cannot point two statements at the same sink.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually true — you can insert multiple times into the same sink; this will compile to a diamond dag


#### Route one stream to multiple tables

A single pipeline can run multiple `INSERT` statements, separated by semicolons. Each statement reads from the same stream and writes to a different sink, so you can route ("fan out") events from one stream into several tables based on their content.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth pointing out that you are only charged once for transformations, not per-statement, so this is cheaper than having multiple pipelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

content:new Request for new/missing content product:pipelines size/m triage Waiting for a docs team review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants