[docs] Add Real-Time User Profile quickstart tutorial by Prajwal-banakar · Pull Request #2669 · apache/fluss

Prajwal-banakar · 2026-02-12T11:26:14Z

Purpose

Linked issue: close #2659

The purpose of this change is to add a new quickstart tutorial, "Real-Time User Profile," to the Apache Fluss documentation. This tutorial demonstrates a realistic, production-grade business scenario by combining the Auto-Increment Column and Aggregation Merge Engine features. It specifically addresses the need for guidance on mapping high-cardinality string identifiers to compact integers for efficient real-time analytics.

Brief change log

This pull request introduces a comprehensive tutorial located at website/docs/quickstartUuser-Profile.md. Key changes include:

Realistic Use Case: Developed a scenario focused on identity mapping (Email to UID) and real-time metric aggregation (Total Clicks and Unique Visitors).

Feature Integration: Showcases the synergy between FIP-16 (Auto-Increment) for dictionary management and FIP-21 (Aggregation Merge Engine) for storage-level pre-aggregation.

Technical Optimization: Implemented the maintainer's recommendation to use INT for the generated uid column to maximize storage efficiency and performance for RoaringBitmap (rbm64) operations.

Reliability Section: Added documentation on Undo Recovery to explain how Fluss ensures exactly-once accuracy for aggregations during Flink failovers.

Visual Guidance: Included an architectural diagram to illustrate the data flow from raw event ingestion to the final pre-aggregated profile storage.

Tests

Documentation Build: Verified that the documentation builds correctly using the local Docusaurus environment and that the new page is correctly linked in the sidebar.

SQL Verification: Manually verified the Flink SQL syntax against the Apache Fluss 0.9 connector specifications.

API and Format

This change is documentation-only and does not affect the Java API or the underlying storage format.

Documentation

Yes, this change introduces a new documentation feature (a new quickstart tutorial) aimed at guiding users through the adoption of Fluss's advanced streaming storage capabilities.

Prajwal-banakar · 2026-02-12T11:33:37Z

Hi @wuchong PTAL!

wuchong · 2026-02-12T15:06:40Z

@polyzos @xx789633 could you help to review this?

wuchong

Hi @Prajwal-banakar, thank you for your contribution! However, quickstart documentation typically needs to be fully reproducible—readers should be able to follow it step by step and achieve the same results, just like in our existing quickstarts:
https://fluss.apache.org/docs/quickstart/flink/ and https://fluss.apache.org/docs/quickstart/lakehouse/.

Could you please enhance the guide by adding the environment setup (e.g., using Docker Compose), clear instructions on how to run the queries, and guidance on how to visualize or verify the results? This will greatly improve usability and consistency with our documentation standards.

xx789633 · 2026-02-12T16:42:51Z

Hi @Prajwal-banakar thanks for the pr. I have the same suggestion as @wuchong . We need to make the example fully reproducible. For example, to ingest the raw data for the source datastream, we can provide a csv file as sample data, just like this example: https://github.com/aliyun/alibabacloud-hologres-connectors/blob/master/hologres-connector-examples/hologres-connector-flink-examples/src/main/java/com/alibaba/hologres/connector/flink/example/FlinkRoaringBitmapAggJob.java

Prajwal-banakar · 2026-02-12T17:54:31Z

Hi @wuchong @xx789633 Thanks for quick feedback ! Enhanced the quick start guide, PTAL!

wuchong · 2026-02-13T01:57:42Z

Hi @Prajwal-banakar, please ensure the quickstart can run successfully. Additionally, the image appears to be AI-generated. I don’t object to AI-generated content in principle, but please make sure the text in the image contains no garbled characters and all content makes sense.

Prajwal-banakar · 2026-02-13T13:24:24Z

Hi @wuchong verified the guide locally it is working ! and fixed the diagram format,

wuchong · 2026-02-13T16:26:21Z

website/docs/quickstart/User-Profile.md

+    d.uid,
+    -- Convert INT to BYTES for rbm64. 
+    -- Note: In a real production job, you might use a UDF to ensure correct bitmap initialization.
+    CAST(CAST(d.uid AS STRING) AS BYTES),


I believe we need the to_rbm and from_rbm Flink UDFs to process the data correctly. Without these functions, the results would be meaningless and users would not understand the purpose of this feature.

However, shipping Flink UDFs falls outside the scope of the Fluss project. I will coordinate with members of the Flink community to contribute these UDFs and identify an appropriate location to open source and publish the UDF JARs. Once available, we can reference these functions in our documentation and examples.

That said, the Lunar New Year holiday is approaching in China, so we likely will not be able to start this work until March. Until then, we may need to suspend this PR. Thank you for your quick updating.

I believe we need the to_rbm and from_rbm Flink UDFs to process the data correctly. Without these functions, the results would be meaningless and users would not understand the purpose of this feature.

Exactly. we need such functions as RB_CARDINALITY and RB_OR_AGG for aggregating the result bitmap.

Prajwal-banakar · 2026-02-13T16:53:46Z

Hi @wuchong, @xx789633,
Thank you for the detailed explanation.
I understand that this work will be coordinated with the Flink community. I am happy to keep this PR open and suspended until those JARs are published. Once the UDFs are available, I will update the guide to include the proper function calls and verification steps.
Happy Lunar New Year to the team! I look forward to finalize this! Thanks.

Prajwal-banakar · 2026-03-06T12:42:15Z

Hi @wuchong @xx789633 i've created a temporary repo for flink udfs as we discussed in slack and updated the quick start guide, it is working smoothly. The repo is available at : https://github.com/Prajwal-banakar/flink-roaringbitmap
so once the original udfs are available from flink side, we will update the quick start guide or let me know any suggestions or ideas? PTAL!

platinumhamburg · 2026-03-25T07:30:37Z

Hi @Prajwal-banakar ,
Thank you for your outstanding work! Would it be possible to move the RoaringBitmap plugin you developed to the repository at https://github.com/flink-extended/flink-roaringbitmap? This repo will serve as a common UDF library that Fluss can reference in the future, enabling collaborative contributions from the community. We may also need to add new UDF functions, such as BITMAP_OR_AGG, to support roll-up aggregation examples. cc @wuchong

Prajwal-banakar · 2026-03-25T08:38:15Z

Hi @platinumhamburg, thank you for the kind words! I will move the RoaringBitmap UDFs to that repo and i'll open a PR their shortly. I also have an active [DISCUSS] thread on the mailing list proposing Native Bitmap Integration for Fluss https://lists.apache.org/thread/z9dwyg81cs3bt7yssb4n3vg17o767r5s and it is open for your input, so I am happy to contribute UDFs there as well and collaborate on expanding the library. I will update this PR once the JAR is published from the official repo.

Prajwal-banakar · 2026-03-25T08:50:22Z

Hi @platinumhamburg, I tried to fork https://github.com/flink-extended/flink-roaringbitmap
but GitHub doesn't allow forking an empty repository. Could you please add me as a
contributor so I can push the initial UDF implementation directly?
My GitHub username is Prajwal-banakar. Thanks!

platinumhamburg · 2026-03-26T10:10:11Z

Prajwal-banakar

Hi @Prajwal-banakar, sorry about that—the reason you couldn’t fork earlier was that the repository was empty. It’s now ready to go.

Also, could we add an rb_or_agg function in the initial implementation? This would help us include a roll-up query example in the quick start guide.

polyzos · 2026-03-30T06:23:05Z

Hello @Prajwal-banakar.. Thank you for this, it's really nice work. However I have the two following comments:

For the Fluss quickstarts I would like to focus only on functionality that is already supported on Fluss.. Quickstarts are meant for people that are new to the project and this comes with a certain level of unfamiliarity, thus extra things might create extra confusion.. This could be a great blog post though for our website though for those looking to deep dive.. WDYT
For the images, although a bit boilerplate I would prefer we use tools to generate them, becaue AI generated diagrams still mess up a few things, like the letters in this case, unless they can be fixed. Or maybe you can create one with something like excalidraw. WDYT?

Prajwal-banakar · 2026-03-30T06:41:56Z

Hi @polyzos, thank you for the review!

Regarding point 1: The rbm64 aggregation type is already officially supported in Fluss 0.9 (FIP-21). The companion UDFs (rb_build_agg, rb_cardinality) are now publishing in the official flink-extended/flink-roaringbitmap repo, which was set up by @wuchong and @platinumhamburg specifically to support this quickstart. So the dependency is intentional and endorsed by the maintainers.
That said, I'm happy to discuss whether a blog post format would be more appropriate @wuchong @platinumhamburg WDYT?

Regarding point 2: Completely agree. I'll replace the diagram with a clean one made in Excalidraw.

polyzos · 2026-03-30T06:50:48Z

@Prajwal-banakar correct and apologies I should have been more precise.

What I was thinking is to have the quickstart work with Fluss specific features in the quickstart to keep the complexity to the bare minimum for new users - i.e how they can use the auto-increment column along with the aggregation merge engine.

Then create an "extended" version in a blog post for those looking for more.. This approach will help as I mentioned:

keep the initial quickstart guide to the bare minimum, because Fluss already comes with a certain level of complexity for new users
allow us to have more resources around the topic as a follow-up

This is just a suggestion, both approaches I think are valuable and I'm happy with both.. Just cautious for new users., that's all.

Prajwal-banakar · 2026-03-31T15:25:11Z

Hi @polyzos, that makes complete sense and I appreciate the clarification!

I agree with this approach:

Simplified quickstart focuses only on Fluss-native features: auto-increment column + aggregation merge engine with a simple sum aggregation. No external UDFs, minimal complexity for new users.
Blog post the full end-to-end RoaringBitmap example with rb_build_agg, rb_cardinality, and unique visitor counting for those who want to go deeper.

I'll simplify this PR to the basic quickstart version and work on the blog post separately. @wuchong @platinumhamburg does this approach work for you as well?

Prajwal-banakar · 2026-04-07T11:22:59Z

Hi @wuchong @polyzos @platinumhamburg,

Quick update: the external RoaringBitmap UDF library has now been officially released as flink-roaringbitmap v0.1.0, so the artifact is available now.

That said, I agree with the latest direction for this PR keeping the Fluss quickstart focused on a minimal, Fluss-native path will make it much easier for new users to follow.

I’ll update this PR accordingly by simplifying it to focus on the auto-increment column + aggregation merge engine workflow, and @polyzos is working on a blog/deep-dive covering the extended bitmap use case.

Please let me know if there’s any specific structure or example you’d prefer for the simplified quickstart.

polyzos · 2026-04-08T07:40:00Z

@Prajwal-banakar looking forward to the update.. you can also take a look here
#3022

polyzos · 2026-04-08T08:14:48Z

website/docs/quickstart/User-Profile.md

+    uid   INT,
+    PRIMARY KEY (email) NOT ENFORCED
+) WITH (
+    'connector'             = 'fluss',


since we are already in the catalog with dont need to specify the connector for every CREATE TABLE statement

polyzos · 2026-04-08T08:15:24Z

website/docs/quickstart/User-Profile.md

+) WITH (
+    'connector'             = 'fluss',
+    'auto-increment.fields' = 'uid',
+    'bucket.num'            = '1'


bucket.num = 1 is the default for every table, so again we can remove this for every CREATE TABLE statement.

polyzos · 2026-04-08T08:18:08Z

can we also change the name from User-Profile.md to user_profile.md? or realtime_profiles_quickstart.md

Prajwal-banakar · 2026-04-08T11:09:10Z

Hi @polyzos, I’ve updated the PR and verified locally it is working fine.
Added a reference to your blog in the “What’s Next” section. At the moment, since your PR is not yet merged, the link is causing the CI to fail due to a broken route in the docs build.

Prajwal-banakar added 2 commits February 12, 2026 10:58

Add Real-Time User Profile quickstart tutorial

be13af5

Added Diagram

8881dbd

wuchong requested review from polyzos and xx789633 February 12, 2026 15:06

wuchong reviewed Feb 12, 2026

View reviewed changes

enhanced guide by adding the environment setup

ee09dbc

Prajwal-banakar added 2 commits February 13, 2026 11:16

enhanced guide by adding the environment setup

d481162

verified environment setup

0e8af82

improved

361a094

wuchong reviewed Feb 13, 2026

View reviewed changes

Fix quickstart:by temp repo

7e64c13

Prajwal-banakar requested a review from wuchong March 6, 2026 12:46

polyzos reviewed Apr 8, 2026

View reviewed changes

[docs] added Real-Time User Profile quickstart

c8ce79d

Conversation

Prajwal-banakar commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Prajwal-banakar commented Feb 12, 2026

Uh oh!

wuchong commented Feb 12, 2026

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

xx789633 commented Feb 12, 2026

Uh oh!

Prajwal-banakar commented Feb 12, 2026

Uh oh!

wuchong commented Feb 13, 2026

Uh oh!

Prajwal-banakar commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wuchong Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

xx789633 Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Prajwal-banakar commented Feb 13, 2026

Uh oh!

Prajwal-banakar commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

platinumhamburg commented Mar 25, 2026

Uh oh!

Prajwal-banakar commented Mar 25, 2026

Uh oh!

Prajwal-banakar commented Mar 25, 2026

Uh oh!

platinumhamburg commented Mar 26, 2026

Uh oh!

polyzos commented Mar 30, 2026

Uh oh!

Prajwal-banakar commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

polyzos commented Mar 30, 2026

Uh oh!

Prajwal-banakar commented Mar 31, 2026

Uh oh!

Prajwal-banakar commented Apr 7, 2026

Uh oh!

polyzos commented Apr 8, 2026

Uh oh!

polyzos Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

polyzos Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

polyzos commented Apr 8, 2026

Uh oh!

Prajwal-banakar commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Prajwal-banakar commented Feb 12, 2026 •

edited

Loading

Prajwal-banakar commented Feb 13, 2026 •

edited

Loading

Prajwal-banakar commented Mar 6, 2026 •

edited

Loading

Prajwal-banakar commented Mar 30, 2026 •

edited

Loading

Prajwal-banakar commented Apr 8, 2026 •

edited

Loading