refactor window aggregation and emit policies (#487)

chenziliang · web-flow · commit 7d3977fea790 · 2025-10-01T01:03:38.000-07:00
* refactor window aggregation and emit policies

* refactor window aggregation and emit policies
diff --git a/docs/global-aggregation.md b/docs/global-aggregation.md
@@ -0,0 +1,188 @@
+# Global Aggregation 
+
+## Overview
+
+In Timeplus, we define global aggregation as an aggregation query without using streaming windows like tumble, hop. Unlike streaming window aggregation, global streaming aggregation doesn't slice
+the unbound streaming data into windows according to timestamp, instead it processes the unbounded streaming data as one huge big global window. Due to this property, Timeplus for now can't
+recycle in-memory aggregation states / results according to timestamp for global aggregation.
+
+```sql
+SELECT <column_name1>, <column_name2>, <aggr_function>
+FROM <stream_name>
+[WHERE clause]
+EMIT PERIODIC [<n><UNIT>]
+```
+
+`PERIODIC <n><UNIT>` tells Timeplus to emit the aggregation periodically. `UNIT` can be ms(millisecond), s(second), m(minute),h(hour),d(day).`<n>` shall be an integer greater than 0.
+
+Examples
+
+```sql
+SELECT device, count(*)
+FROM device_utils
+WHERE cpu_usage > 99
+EMIT PERIODIC 5s
+```
+
+Like in [Streaming Tail](/query-syntax#streaming-tailing), Timeplus continuously monitors new events in the stream `device_utils`, does the filtering and then continuously does **incremental** count aggregation. Whenever the specified delay interval is up, project the current aggregation result to clients.
+
+## Emit Policies
+
+### EMIT PERIODIC {#emit_periodic}
+
+`PERIODIC <n><UNIT>` tells Timeplus to emit the aggregation periodically. `UNIT` can be ms(millisecond), s(second), m(minute),h(hour),d(day).`<n>` shall be an integer greater than 0.
+
+Example:
+
+```sql
+SELECT device, count(*)
+FROM device_utils
+WHERE cpu_usage > 99
+EMIT PERIODIC 5s
+```
+
+For [Global Streaming Aggregation](#global) the default periodic emit interval is `2s`, i.e. 2 seconds.
+
+You can also apply `EMIT PERIODIC` in time windows, such as tumble/hop/session.
+
+When you run a tumble window aggregation, by default Timeplus will emit results when the window is closed. So `tumble(stream,5s)` will emit results every 5 seconds, unless there is no event in the window to progress the watermark.
+
+In some cases, you may want to get aggregation results even the window is not closed, so that you can get timely alerts. For example, the following SQL will run a 5-second tumble window and every 1 second, if the number of event is over 300, a row will be emitted.
+
+```sql
+SELECT window_start, count() AS cnt
+FROM tumble(car_live_data, 5s)
+GROUP BY window_start
+HAVING cnt > 300
+EMIT PERIODIC 1s
+```
+
+### EMIT PERIODIC REPEAT {#emit_periodic_repeat}
+
+Starting from Timeplus Proton 1.6.2, you can optionally add `REPEAT` to the end of `EMIT PERIODIC <n><UNIT>`. For global aggregations, by default every 2 seconds, the aggregation result will be emitted. But if there is no new event since last emit, no result will be emitted. With the `REPEAT` at the end of the emit policy, Timeplus will emit results at the fixed interval, even there is no new events since last emit. For example:
+```sql
+SELECT count() FROM t
+EMIT PERIODIC 3s REPEAT
+```
+
+### EMIT TIMEOUT
+
+You can apply `EMIT TIMEOUT` on global aggregation, e.g.
+```sql
+SELECT count() FROM t EMIT TIMEOUT 1s;
+```
+
+It also can be applied to window aggregations and `EMIT AFTER WINDOW CLOSE` is automatically appended, e.g.
+```sql
+SELECT count() FROM tumble(t,5s) GROUP BY window_start EMIT TIMEOUT 1s;
+```
+ 
+### EMIT ON UPDATE {#emit_on_update}
+
+You can apply `EMIT ON UPDATE` in time windows, such as tumble/hop/session, with `GROUP BY` keys. For example:
+
+```sql
+SELECT
+  window_start, cid, count() AS cnt
+FROM
+  tumble(car_live_data, 5s)
+WHERE
+  cid IN ('c00033', 'c00022')
+GROUP BY
+  window_start, cid
+EMIT ON UPDATE
+```
+
+During the 5 second tumble window, even the window is not closed, as long as the aggregation value(`cnt`) for the same `cid` is different , the results will be emitted.
+
+### EMIT ON UPDATE WITH BATCH {#emit_on_update_with_batch}
+
+You can combine `EMIT PERIODIC` and `EMIT ON UPDATE` together. In this case, even the window is not closed, Timeplus will check the intermediate aggregation result at the specified interval and emit rows if the result is changed.
+```sql
+SELECT
+  window_start, cid, count() AS cnt
+FROM
+  tumble(car_live_data, 5s)
+WHERE
+  cid IN ('c00033', 'c00022')
+GROUP BY
+  window_start, cid
+EMIT ON UPDATE WITH BATCH 2s
+```
+
+### EMIT AFTER KEY EXPIRE IDENTIFIED BY .. WITH MAXSPAN .. AND TIMEOUT .. {#emit_after_key_expire}
+
+The syntax is:
+```sql
+EMIT AFTER KEY EXPIRE [IDENTIFIED BY <col>] WITH [ONLY] MAXSPAN <internal> [AND TIMEOUT <internal>]
+```
+
+Note:
+* `EMIT AFTER KEY EXPIRE` will emit results when the keys are expired. This EMIT policy ought to be applied to a global aggregation with a primary key as `GROUP BY`, usually using an ID for multiple tracing events.
+* `IDENTIFIED BY col` will calculate the span of the trace, usually you can set `IDENTIFIED BY _tp_time`.
+* `MAXSPAN interval` to identify whether the span of the related events over a certain interval, for example `MAXSPAN 500ms` to flag those events with same tracing ID but over 0.5 second span.
+* `ONLY`: if you add this keyword, then only those events over the `MAXSPAN` will be emitted, other events less than the `MAXSPAN` will be omitted, so that you can focus on those events over the SLA.
+* `AND TIMEOUT interval` to avoid waiting for late events for too long. If there is no more events with the same key (e.g. tracing ID) after this interval, Timeplus will close the session for the key and emit results.
+
+It's required to use `SETTINGS default_hash_table='hybrid'` with this emit policy to avoid putting too much data in memory.
+
+Here is an example to get the log streams and only show the events with over 0.5 second as the end-to-end latency.
+```sql
+WITH grouped AS(
+    SELECT
+        trace_id,
+        min(start_time) AS start_ts,
+        max(end_time) AS end_ts,
+        date_diff('ms', start_ts, end_ts) AS span_ms,
+        group_array(json_encode(span_id, parent_span_id, name, start_time, end_time, attributes)) AS trace_events
+    FROM otel_traces
+    GROUP BY trace_id
+    EMIT AFTER KEY EXPIRE IDENTIFIED BY end_time WITH MAXSPAN 500ms AND TIMEOUT 2s
+)
+SELECT json_encode(trace_id, start_ts, end_ts, span_ms, trace_events) AS event FROM grouped
+SETTINGS default_hash_table='hybrid', max_hot_keys=1000000, allow_independent_shard_processing=true;
+```
+
+### EMIT PER EVENT
+This emit policy allows you to emit results for every event in the stream, which can be useful for debugging or monitoring purposes.
+
+For example, if you create a random stream `market_data` and run:
+```sql
+select count() from market_data
+```
+You will get the count of all events in the stream, every 2 seconds by default. Such as 10, 20, 30, etc.
+
+If you want to emit results for every event, you can use:
+```sql
+select count() from market_data emit per event
+```
+You will get the count of all events in the stream, every time a new event is added to the stream. Such as 1, 2, 3, 4, 5, etc.
+
+This new emit policy is useful for specific use cases where you want to see the results of your query for every event in the stream. It can be particularly useful for debugging or monitoring purposes, as it allows you to see the results of your query in real-time as new events are added to the stream.
+
+For high throughput streams, you may want to use this emit policy with caution, as it can generate a lot of output and may impact the performance of your query.
+
+There are some limitations for this emit policy:
+
+It does not support parallel processing, so it may not be suitable for high throughput streams. If there are multiple partitions for the Kafka external stream or multiple shards for the Timeplus stream, this emit policy will not work.
+
+One workaround is to use `SHUFFLE BY` to shuffle the data into one partition or shard, but this may impact the performance of your query. For example, you can use:
+```sql
+select type, count() from github_events shuffle by type group by type emit per event;
+```
+
+The other possible workaround if the stream's sharding expression is based on id, for example:
+```sql
+create stream multi_shards_stream(id int, ...) settings shards=3, sharding_expr='weak_hash32(id)';
+```
+In this case, you can set `allow_independent_shard_processing=true` to process in parallel.
+
+```sql
+SELECT id, count() FROM multi_shards_stream GROUP BY id EMIT PER EVENT
+SETTINGS allow_independent_shard_processing=true;
+```
+
+The other limitation is that it does not support substream processing. For example, the following query will not work:
+```sql
+SELECT id, count() FROM single_shard_stream partition by id EMIT PER EVENT
+```
diff --git a/docs/hop-aggregation.md b/docs/hop-aggregation.md
@@ -0,0 +1,120 @@
+# Hop Window Aggregation {#hop_window}
+
+## Overview
+
+Like [Tumble](#tumble), Hop also slices the unbounded streaming data into smaller windows, and it has an additional sliding step.
+
+```sql
+SELECT <column_name1>, <column_name2>, <aggr_function>
+FROM hop(<stream_name>, [<timestamp_column>], <hop_slide_size>, [hop_windows_size], [<time_zone>])
+[WHERE clause]
+GROUP BY [<window_start | window_end>], ...
+EMIT <window_emit_policy>
+SETTINGS <key1>=<value1>, <key2>=<value2>, ...
+```
+
+Hop window is a more generalized window compared to tumble window. Hop window has an additional
+parameter called `<hop_slide_size>` which means window progresses this slide size every time. There are 3 cases:
+
+1. `<hop_slide_size>` is less than `<hop_window_size>`. Hop windows have overlaps meaning an event can fall into several hop windows.
+2. `<hop_slide_size>` is equal to `<hop_window_size>`. Degenerated to a tumble window.
+3. `<hop_slide_size>` is greater than `<hop_window_size>`. Windows has a gap in between. Usually not useful, hence not supported so far.
+
+Please note, at this point, you need to use the same time unit in `<hop_slide_size>` and `<hop_window_size>`, for example `hop(device_utils, 1s, 60s)` instead of `hop(device_utils, 1s, 1m)`.
+
+Here is one hop window example which has 2 seconds slide and 5 seconds hop window.
+
+```
+["2020-01-01 00:00:00", "2020-01-01 00:00:05]
+["2020-01-01 00:00:02", "2020-01-01 00:00:07]
+["2020-01-01 00:00:04", "2020-01-01 00:00:09]
+["2020-01-01 00:00:06", "2020-01-01 00:00:11]
+...
+```
+
+Except that the hop window can have overlaps, other semantics are identical to the tumble window.
+
+```sql
+SELECT device, max(cpu_usage)
+FROM hop(device_utils, 2s, 5s)
+GROUP BY device, window_end
+EMIT AFTER WINDOW CLOSE;
+```
+
+The above example SQL continuously aggregates max cpu usage per device per hop window for stream `device_utils`. Every time a window is closed, Timeplus emits the aggregation results.
+ 
+## Emit Policies
+
+### EMIT AFTER WINDOW CLOSE {#emit_after}
+
+You can omit `EMIT AFTER WINDOW CLOSE`, since this is the default behavior for time window aggregations. For example:
+
+```sql
+SELECT device, max(cpu_usage)
+FROM tumble(device_utils, 5s)
+GROUP BY device, window_end
+```
+
+The above example SQL continuously aggregates max cpu usage per device per tumble window for the stream `devices_utils`. Every time a window is closed, Timeplus Proton emits the aggregation results. How to determine the window should be closed? This is done by [Watermark](/understanding-watermark), which is an internal timestamp. It is guaranteed to be increased monotonically per stream query.
+
+### EMIT AFTER WINDOW CLOSE WITH DELAY {#emit_after_with_delay}
+
+Example:
+
+```sql
+SELECT device, max(cpu_usage)
+FROM tumble(device_utils, 5s)
+GROUP BY device, widnow_end
+EMIT AFTER WINDOW CLOSE WITH DELAY 2s;
+```
+
+The above example SQL continuously aggregates max cpu usage per device per tumble window for the stream `device_utils`. Every time a window is closed, Timeplus Proton waits for another 2 seconds and then emits the aggregation results.
+
+### EMIT ON UPDATE {#emit_on_update}
+
+You can apply `EMIT ON UPDATE` in time windows, such as tumble/hop/session, with `GROUP BY` keys. For example:
+
+```sql
+SELECT
+  window_start, cid, count() AS cnt
+FROM
+  tumble(car_live_data, 5s)
+WHERE
+  cid IN ('c00033', 'c00022')
+GROUP BY
+  window_start, cid
+EMIT ON UPDATE
+```
+
+During the 5 second tumble window, even the window is not closed, as long as the aggregation value(`cnt`) for the same `cid` is different , the results will be emitted.
+
+### EMIT ON UPDATE WITH DELAY {#emit_on_update_with_delay}
+
+Adding the `WITH DELAY` to `EMIT ON UPDATE` will allow late event for the window aggregation.
+
+```sql
+SELECT
+  window_start, cid, count() AS cnt
+FROM
+  tumble(car_live_data, 5s)
+WHERE
+  cid IN ('c00033', 'c00022')
+GROUP BY
+  window_start, cid
+EMIT ON UPDATE WITH DELAY 2s
+```
+
+### EMIT ON UPDATE WITH BATCH {#emit_on_update_with_batch}
+
+You can combine `EMIT PERIODIC` and `EMIT ON UPDATE` together. In this case, even the window is not closed, Timeplus will check the intermediate aggregation result at the specified interval and emit rows if the result is changed.
+```sql
+SELECT
+  window_start, cid, count() AS cnt
+FROM
+  tumble(car_live_data, 5s)
+WHERE
+  cid IN ('c00033', 'c00022')
+GROUP BY
+  window_start, cid
+EMIT ON UPDATE WITH BATCH 2s
+```
diff --git a/docs/jit.md b/docs/jit.md
@@ -1,4 +1,4 @@
-# Just-In-Time (JIT) compilation
+# Just-In-Time (JIT) Compilation
 
 Starting from Timeplus Enterprise 2.9, the JIT compilation is enabled by default. For example, if you need to run the following SQL multiple times:
 ```sql
diff --git a/docs/session-aggregation.md b/docs/session-aggregation.md
@@ -0,0 +1,3 @@
+# Session Window Aggregation {#session_window}
+
+This is similar to tumble and hop window. Please check the [session](/functions_for_streaming#session) function.
diff --git a/docs/tumble-aggregation.md b/docs/tumble-aggregation.md
diff --git a/sidebars.js b/sidebars.js

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# Just-In-Time (JIT) compilation`
	`1`	`+# Just-In-Time (JIT) Compilation`
`2`	`2`
`3`	`3`	`Starting from Timeplus Enterprise 2.9, the JIT compilation is enabled by default. For example, if you need to run the following SQL multiple times:`
`4`	`4`	```sql
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Session Window Aggregation {#session_window}`
	`2`	`+`
	`3`	`+This is similar to tumble and hop window. Please check the [session](/functions_for_streaming#session) function.`