[Pipelines] Add Bluesky firehose fan-out example and document multi-statement SQL#31698
[Pipelines] Add Bluesky firehose fan-out example and document multi-statement SQL#31698Marcinthecloud wants to merge 5 commits into
Conversation
|
This pull request requires reviews from CODEOWNERS as it changes files that match the following patterns:
|
There was a problem hiding this comment.
3 issues need fixing before merge: description too long, TypeScript example missing indentation, and a missing changelog entry.
MEDIUM — bluesky-firehose-fanout.mdx line 5: description exceeds 160 characters. Shorten to 50–160 characters per the style guide.
MEDIUM — bluesky-firehose-fanout.mdx lines 214–319: TypeScript example has missing/inconsistent indentation and mixed tabs and spaces. Format the entire block so it is safe to copy-paste.
MEDIUM — Missing changelog entry. The PR documents a new capability (multi-statement SQL) and adds a new example page. Add a changelog entry in src/content/changelog/pipelines/.
Posted 5 inline suggestions for the above plus minor style fixes.
|
I have completed the review of PR #31698. Labels applied: Summary of changes
Issues flagged
Other suggestions posted
|
…rt, fix code indentation, use refer to
|
@ask-bonk can you re-review the latest please? |
Review
Code ReviewThis code review is in beta and may not always be helpful — use your judgment. Warnings (2)
Style Guide ReviewSuggestions (3)
CommandsOnly codeowners can run commands. Post a comment with the command to trigger it.
|
| { "name": "event_type", "type": "string", "required": true }, | ||
| { "name": "did", "type": "string", "required": false }, | ||
| { "name": "operation", "type": "string", "required": false }, | ||
| { "name": "time_us", "type": "int64", "required": false }, |
There was a problem hiding this comment.
this can be a timestamp field instead of int64
| WHERE event_type = 'view_product'; | ||
| ``` | ||
|
|
||
| Each `INSERT` statement requires its own sink, and each sink writes to its own table. You cannot point two statements at the same sink. |
There was a problem hiding this comment.
This isn't actually true — you can insert multiple times into the same sink; this will compile to a diamond dag
|
|
||
| #### Route one stream to multiple tables | ||
|
|
||
| A single pipeline can run multiple `INSERT` statements, separated by semicolons. Each statement reads from the same stream and writes to a different sink, so you can route ("fan out") events from one stream into several tables based on their content. |
There was a problem hiding this comment.
Maybe worth pointing out that you are only charged once for transformations, not per-statement, so this is cheaper than having multiple pipelines
Summary
;-separatedINSERT … SELECTstatements in one pipeline), a supported but previously undocumented capability./pipelines/examples/section with a fan-out tutorial: consume the public Bluesky Jetstream firehose in a Durable Object, ingest to one stream, and route events into five R2 Data Catalog tables by type.select-statements.mdxandmanage-pipelines.mdxto describe routing one stream to multiple tables.Documentation checklist