Skip to content

Commit dcfa0e3

Browse files
authored
refine tumble window (#513)
1 parent 8f7f1bc commit dcfa0e3

File tree

2 files changed

+118
-76
lines changed

2 files changed

+118
-76
lines changed

docs/tumble-aggregation.md

Lines changed: 118 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -1,141 +1,183 @@
1-
# Tumble Window Aggregation {#tumble_window}
1+
# Tumble Window Aggregation
22

33
## Overview
4-
Tumble slices the unbounded data into different windows according to its parameters. Internally, Timeplus observes the data streaming and automatically decides when to close a sliced window and emit the final results for that window.
4+
5+
A **tumbling window** divides an unbounded data stream into **fixed-size, non-overlapping intervals** based on event time or processing time.
6+
7+
Each event belongs to **exactly one** window. This is useful for periodic aggregations such as per-minute averages, hourly counts, or daily summaries.
8+
9+
Unlike **hopping** or **session** windows, tumbling windows do not overlap — once a window closes, a new one immediately starts.
10+
This makes them simple, deterministic, and ideal for producing periodic reports or metrics.
11+
12+
## Syntax
513

614
```sql
7-
SELECT <column_name1>, <column_name2>, <aggr_function>
8-
FROM tumble(<stream_name>, [<timestamp_column>], <tumble_window_size>, [<time_zone>])
15+
SELECT <grouping-keys>, <aggr_functions>
16+
FROM tumble(<stream-name>, [<timestamp-column>], <window-size>])
917
[WHERE clause]
10-
GROUP BY [window_start | window_end], ...
11-
EMIT <window_emit_policy>
18+
GROUP BY [window_start | window_end], <other-group-keys> ...
19+
EMIT <emit-policy>
1220
SETTINGS <key1>=<value1>, <key2>=<value2>, ...
1321
```
1422

15-
Tumble window means a fixed non-overlapped time window. Here is one example for a 5 seconds tumble window:
23+
### Parameters
24+
25+
- `<stream-name>` : the source stream the tumble window applied to. **Required**
26+
- `<timestamp-column>` : the event timestamp column which is used to calculate window starts / ends and internal watermark. You can use `now()` or `now64(3)` to enable processing time tumble window. Default is `_tp_time` if absent. **Optional**
27+
- `<window-size>` : tumble window interval size. Supported interval units are listed below. **Required**
28+
- `s` : second
29+
- `m` : miniute
30+
- `h` : hour
31+
- `d` : day
32+
- `w` : week
33+
34+
### Example
1635

1736
```
18-
["2020-01-01 00:00:00", "2020-01-01 00:00:05]
19-
["2020-01-01 00:00:05", "2020-01-01 00:00:10]
20-
["2020-01-01 00:00:10", "2020-01-01 00:00:15]
21-
...
37+
CREATE STREAM device_metrics (
38+
device_id string,
39+
cpu_usage float,
40+
event_time datetime64(3)
41+
);
42+
43+
SELECT
44+
window_start,
45+
window_end,
46+
device_id,
47+
avg(cpu_usage) AS avg_cpu
48+
FROM tumble(device_metrics, event_time, 5s)
49+
GROUP BY
50+
window_start,
51+
device_id
52+
EMIT AFTER WINDOW CLOSE;
2253
```
2354

24-
`tumble` window in Timeplus is left closed and right open `[)` meaning it includes all events which have timestamps **greater or equal** to the **lower bound** of the window, but **less** than the **upper bound** of the window.
55+
**Explanation**:
56+
- Events are grouped into **5-second, non-overlapping windows** based on their `event_time`.
57+
- Each `device_id`’s events are aggregated independently within each window.
58+
- When a window closes, the system emits one aggregated result per device with the computed `avg_cpu`.
2559

26-
`tumble` in the above SQL spec is a table function whose core responsibility is assigning tumble window to each event in a streaming way. The `tumble` table function will generate 2 new columns: `window_start, window_end` which correspond to the low and high bounds of a tumble window.
60+
**Example timeline**:
2761

28-
`tumble` table function accepts 4 parameters: `<timestamp_column>` and `<time-zone>` are optional, the others are mandatory.
62+
| Window | Time Range | Events Included |
63+
| :----: | :------------------ | :----------------------------- |
64+
| W1 | 00:00:00 → 00:00:05 | Events in [00:00:00, 00:00:05) |
65+
| W2 | 00:00:05 → 00:00:10 | Events in [00:00:05, 00:00:10) |
66+
| W3 | 00:00:10 → 00:00:15 | Events in [00:00:10, 00:00:15) |
2967

30-
When the `<timestamp_column>` parameter is omitted from the query, the stream's default event timestamp column which is `_tp_time` will be used.
68+
![TumbleWindow](/img/tumble-window.png)
3169

32-
When the `<time_zone>` parameter is omitted the system's default timezone will be used. `<time_zone>` is a string type parameter, for example `UTC`.
70+
Each event falls into exactly one window, ensuring deterministic aggregation and predictable output intervals.
3371

34-
`<tumble_window_size>` is an interval parameter: `<n><UNIT>` where `<UNIT>` supports `s`, `m`, `h`, `d`, `w`.
35-
It doesn't yet support `M`, `q`, `y`. For example, `tumble(my_stream, 5s)`.
72+
## Emit Policies
3673

37-
More concrete examples:
74+
Emit policies define **when** Timeplus should output results from **time-windowed** aggregations such as **tumble**, **hop**, **session** and **global-windowed** aggregation.
3875

39-
```sql
40-
SELECT device, max(cpu_usage)
41-
FROM tumble(device_utils, 5s)
42-
GROUP BY device, window_end
43-
```
76+
These policies control whether results are emitted **only after the window closes**, **after a delay** to honor late events, or **incrementally during the window**.
77+
78+
### `EMIT AFTER WINDOW CLOSE`
4479

45-
The above example SQL continuously aggregates max cpu usage per device per tumble window for the stream `devices_utils`. Every time a window is closed, Timeplus Proton emits the aggregation results.
80+
This is the **default behavior** for all time window aggregations. Timeplus emits the aggregated results once the window closes.
4681

47-
Let's change `tumble(stream, 5s)` to `tumble(stream, timestmap, 5s)` :
82+
**Example**:
4883

4984
```sql
50-
SELECT device, max(cpu_usage)
51-
FROM tumble(devices, timestamp, 5s)
52-
GROUP BY device, window_end
53-
EMIT AFTER WINDOW CLOSE WITH DELAY 2s;
85+
SELECT window_start, device, max(cpu_usage)
86+
FROM tumble(device_utils, 5s)
87+
GROUP BY window_start, device;
5488
```
5589

56-
Same as the above delayed tumble window aggregation, except in this query, user specifies a **specific time column** `timestamp` for tumble windowing.
90+
This query continuously computes the **maximum CPU usage** per device in every 5-second tumble window from the stream `device_utils`.
91+
Each time a window closes (as determined by the internal watermark), Timeplus emits the results once.
5792

58-
The example below is so called processing time processing which uses wall clock time to assign windows. Timeplus internally processes `now/now64` in a streaming way.
93+
:::info
94+
A watermark is an internal timestamp that advances monotonically per stream, determining when a window can be safely closed.
95+
:::
5996

60-
```sql
61-
SELECT device, max(cpu_usage)
62-
FROM tumble(devices, now64(3, 'UTC'), 5s)
63-
GROUP BY device, window_end
64-
EMIT AFTER WINDOW CLOSE WITH DELAY 2s;
65-
```
97+
### `EMIT AFTER WINDOW CLOSE WITH DELAY`
6698

67-
## Emit Policies
68-
69-
### EMIT AFTER WINDOW CLOSE {#emit_after}
99+
Adds a **delay period** before emitting window results, allowing **late events** to be included.
70100

71-
You can omit `EMIT AFTER WINDOW CLOSE`, since this is the default behavior for time window aggregations. For example:
101+
**Example**:
72102

73103
```sql
74-
SELECT device, max(cpu_usage)
104+
SELECT window_start, device, max(cpu_usage)
75105
FROM tumble(device_utils, 5s)
76-
GROUP BY device, window_end
106+
GROUP BY window_start, device
107+
EMIT AFTER WINDOW CLOSE WITH DELAY 2s;
77108
```
78109

79-
The above example SQL continuously aggregates max cpu usage per device per tumble window for the stream `devices_utils`. Every time a window is closed, Timeplus Proton emits the aggregation results. How to determine the window should be closed? This is done by [Watermark](/understanding-watermark), which is an internal timestamp. It is guaranteed to be increased monotonically per stream query.
110+
This query aggregates the **maximum CPU usage** for each device per 5-second tumble window, then waits 2 additional seconds after the window closes before emitting results. This helps capture any late-arriving events that fall within the window period.
80111

81-
### EMIT AFTER WINDOW CLOSE WITH DELAY {#emit_after_with_delay}
112+
### `EMIT TIMEOUT`
82113

83-
Example:
114+
In some streaming scenarios, the **last tumble window** might remain open because no new events arrive to advance the watermark (i.e., the event time progress).
115+
The **`EMIT TIMEOUT`** clause helps forcefully close such idle windows after a specified period of inactivity.
116+
117+
**Example**:
84118

85119
```sql
86-
SELECT device, max(cpu_usage)
120+
SELECT window_start, device, max(cpu_usage)
87121
FROM tumble(device_utils, 5s)
88-
GROUP BY device, widnow_end
89-
EMIT AFTER WINDOW CLOSE WITH DELAY 2s;
122+
GROUP BY window_start, device
123+
EMIT TIMEOUT 3s;
90124
```
91125

92-
The above example SQL continuously aggregates max cpu usage per device per tumble window for the stream `device_utils`. Every time a window is closed, Timeplus Proton waits for another 2 seconds and then emits the aggregation results.
126+
In this example:
127+
128+
- The query continuously aggregates the maximum CPU usage per device in **5-second tumble windows**.
129+
- If **no new events** arrive for **3 seconds**, Timeplus will **force-close** the most recent window and emit the final results.
130+
- Once the window is closed, the **internal watermark** (event time) advances as well.
131+
- Any **late events** that belong to this now-closed window will be discarded.
132+
133+
### `EMIT ON UPDATE`
93134

94-
### EMIT ON UPDATE {#emit_on_update}
135+
Emits **intermediate aggregation updates** whenever the results change within an open window.
136+
This is useful for near real-time visibility into evolving metrics.
95137

96-
You can apply `EMIT ON UPDATE` in time windows, such as tumble/hop/session, with `GROUP BY` keys. For example:
138+
**Example**:
97139

98140
```sql
99141
SELECT
100-
window_start, cid, count() AS cnt
142+
window_start, device, max(cpu_usage)
101143
FROM
102-
tumble(car_live_data, 5s)
103-
WHERE
104-
cid IN ('c00033', 'c00022')
144+
tumble(device_utils, 5s)
105145
GROUP BY
106-
window_start, cid
107-
EMIT ON UPDATE
146+
window_start, device
147+
EMIT ON UPDATE;
108148
```
109149

110-
During the 5 second tumble window, even the window is not closed, as long as the aggregation value(`cnt`) for the same `cid` is different , the results will be emitted.
150+
Here, during each 5-second window, the system emits updates whenever there are new events flowing into the open window, even before the window closes.
111151

112-
### EMIT ON UPDATE WITH DELAY {#emit_on_update_with_delay}
152+
### `EMIT ON UPDATE WITH BATCH`
113153

114-
Adding the `WITH DELAY` to `EMIT ON UPDATE` will allow late event for the window aggregation.
154+
Combines **periodic emission** with **update-based** triggers.
155+
Timeplus checks the intermediate aggregation results at regular intervals and emits them if they have changed which can significally improve the emit efficiency and throughput compared with `EMIT ON UPDATE`.
156+
157+
**Example**:
115158

116159
```sql
117160
SELECT
118-
window_start, cid, count() AS cnt
161+
window_start, device, max(cpu_usage)
119162
FROM
120-
tumble(car_live_data, 5s)
121-
WHERE
122-
cid IN ('c00033', 'c00022')
163+
tumble(device_utils, 5s)
123164
GROUP BY
124-
window_start, cid
125-
EMIT ON UPDATE WITH DELAY 2s
165+
window_start, device
166+
EMIT ON UPDATE WITH BATCH 1s;
126167
```
127168

128-
### EMIT ON UPDATE WITH BATCH {#emit_on_update_with_batch}
169+
### `EMIT ON UPDATE WITH DELAY`
170+
171+
Similar to **`EMIT ON UPDATE`**, but includes a delay to allow late events before emitting incremental updates.
172+
173+
**Example**:
129174

130-
You can combine `EMIT PERIODIC` and `EMIT ON UPDATE` together. In this case, even the window is not closed, Timeplus will check the intermediate aggregation result at the specified interval and emit rows if the result is changed.
131175
```sql
132176
SELECT
133-
window_start, cid, count() AS cnt
177+
window_start, device, max(cpu_usage)
134178
FROM
135-
tumble(car_live_data, 5s)
136-
WHERE
137-
cid IN ('c00033', 'c00022')
179+
tumble(device_utils, 5s)
138180
GROUP BY
139-
window_start, cid
140-
EMIT ON UPDATE WITH BATCH 2s
181+
window_start, device
182+
EMIT ON UPDATE WITH DELAY 2s;
141183
```

static/img/tumble-window.png

73.8 KB
Loading

0 commit comments

Comments
 (0)