You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Global aggregation** refers to running an aggregation query **without using streaming windows** such as `TUMBLE`, `HOP`, or `SESSION`.
5
+
**Global aggregation** refers to running an aggregation query **without using time-based windows** such as [tumble](/tumble-aggregation), [hop](/hop-aggregation), or [session](/session-aggregation).
6
6
7
-
Unlike windowed aggregations, global aggregation does not slice unbounded streaming data into time-based windows. Instead, it treats the entire unbounded stream as a **single global window**.
7
+
Unlike windowed aggregations that slice unbounded streams into discrete windows, **global aggregation**treats the entire stream as **a single continuous window**.
8
8
9
-
With global aggregation:
10
-
- The query continuously updates aggregation results over all incoming data.
11
-
- Users don’t need to worry about **late events**, since there are no time windows to close.
9
+
With global aggregation:
10
+
- The query continuously updates aggregation results as new events arrive.
11
+
- There is **no concept of window close**, so late events are naturally handled without additional logic.
12
+
- It is ideal for tracking long-running (life-time) metrics such as total counts, averages, or unique users across an entire stream against all historical data.
Global aggregation supports different `emit policies` to control when you like to get the intermidiate results pushing out.
97
+
Global aggregation supports multiple **emit policies** that define **when intermediate results** are pushed out.
104
98
105
-
### EMIT PERIODIC {#emit_periodic}
99
+
### `EMIT PERIODIC`
106
100
107
-
`PERIODIC <n><UNIT>` tells Timeplus to emit the aggregation periodically. `UNIT` can be ms(millisecond), s(second), m(minute),h(hour),d(day).`<n>` shall be an integer greater than 0.
101
+
Emits aggregation results periodically **when new events arrive**.
102
+
This is the **default** emit policy for global aggregation, with a **default interval of 2 seconds**.
108
103
109
-
Example:
104
+
**Syntax**
110
105
111
106
```sql
112
-
SELECT device, count(*)
113
-
FROM device_utils
114
-
WHERE cpu_usage >99
115
-
EMIT PERIODIC 5s
107
+
EMIT PERIODIC <n><UNIT>
116
108
```
117
109
118
-
For [Global Streaming Aggregation](#global) the default periodic emit interval is `2s`, i.e. 2 seconds.
119
-
120
-
You can also apply `EMIT PERIODIC` in time windows, such as tumble/hop/session.
121
-
122
-
When you run a tumble window aggregation, by default Timeplus will emit results when the window is closed. So `tumble(stream,5s)` will emit results every 5 seconds, unless there is no event in the window to progress the watermark.
110
+
**Parameters**:
111
+
-`<n>` — positive integer (interval length)
112
+
-`<UNIT>` can be one of:
113
+
-`ms` (milliseconds)
114
+
-`s` (seconds)
115
+
-`m` (minutes)
116
+
-`h` (hours)
123
117
124
-
In some cases, you may want to get aggregation results even the window is not closed, so that you can get timely alerts. For example, the following SQL will run a 5-second tumble window and every 1 second, if the number of event is over 300, a row will be emitted.
125
118
119
+
**Example**:
126
120
```sql
127
-
SELECTwindow_start, count() AS cnt
128
-
FROMtumble(car_live_data, 5s)
129
-
GROUP BY window_start
130
-
HAVING cnt >300
131
-
EMIT PERIODIC 1s
121
+
SELECTdevice, count(*)
122
+
FROMdevice_utils
123
+
WHERE cpu_usage >99
124
+
GROUP BY device
125
+
EMIT PERIODIC 5s;
132
126
```
133
127
134
-
### EMIT PERIODIC REPEAT {#emit_periodic_repeat}
128
+
This query emits updated results every 5 seconds if new events are received.
135
129
136
-
Starting from Timeplus Proton 1.6.2, you can optionally add `REPEAT` to the end of `EMIT PERIODIC <n><UNIT>`. For global aggregations, by default every 2 seconds, the aggregation result will be emitted. But if there is no new event since last emit, no result will be emitted. With the `REPEAT` at the end of the emit policy, Timeplus will emit results at the fixed interval, even there is no new events since last emit. For example:
137
-
```sql
138
-
SELECTcount() FROM t
139
-
EMIT PERIODIC 3s REPEAT
140
-
```
130
+
### `EMIT PERIODIC REPEAT`
141
131
142
-
### EMIT TIMEOUT
132
+
For `EMIT PERIODIC`, no results are emitted if there are **no new events** since the last emit.
133
+
With the `REPEAT` modifier, Timeplus **emits at a fixed interval**, even when no new data arrives.
143
134
144
-
You can apply `EMIT TIMEOUT` on global aggregation, e.g.
135
+
**Example**:
145
136
```sql
146
-
SELECTcount() FROM t EMIT TIMEOUT 1s;
137
+
SELECT device, count(*)
138
+
FROM device_utils
139
+
WHERE cpu_usage >99
140
+
GROUP BY device
141
+
EMIT PERIODIC 5s REPEAT
147
142
```
148
143
149
-
It also can be applied to window aggregations and `EMIT AFTER WINDOW CLOSE` is automatically appended, e.g.
150
-
```sql
151
-
SELECTcount() FROM tumble(t,5s) GROUP BY window_start EMIT TIMEOUT 1s;
152
-
```
153
-
154
-
### EMIT ON UPDATE {#emit_on_update}
144
+
If no new events appear, the last results are still emitted every 5 seconds.
145
+
146
+
### `EMIT ON UPDATE`
155
147
156
-
You can apply `EMIT ON UPDATE` in time windows, such as tumble/hop/session, with `GROUP BY` keys. For example:
148
+
Emits intermediate results **immediately** when new events change any aggregation value. This is useful for near real-time visibility into evolving metrics.
157
149
158
150
```sql
159
-
SELECT
160
-
window_start, cid, count() AS cnt
161
-
FROM
162
-
tumble(car_live_data, 5s)
163
-
WHERE
164
-
cid IN ('c00033', 'c00022')
165
-
GROUP BY
166
-
window_start, cid
167
-
EMIT ONUPDATE
151
+
SELECT device, count(*)
152
+
FROM device_utils
153
+
WHERE cpu_usage >99
154
+
GROUP BY device
155
+
EMIT ONUPDATE;
168
156
```
169
157
170
-
During the 5 second tumble window, even the window is not closed, as long as the aggregation value(`cnt`) for the same `cid` is different , the results will be emitted.
158
+
Each time new events with `cpu_usage > 99` arrive, updated counts are emitted.
171
159
172
-
### EMIT ON UPDATE WITH BATCH {#emit_on_update_with_batch}
173
-
174
-
You can combine `EMIT PERIODIC` and `EMIT ON UPDATE` together. In this case, even the window is not closed, Timeplus will check the intermediate aggregation result at the specified interval and emit rows if the result is changed.
175
-
```sql
176
-
SELECT
177
-
window_start, cid, count() AS cnt
178
-
FROM
179
-
tumble(car_live_data, 5s)
180
-
WHERE
181
-
cid IN ('c00033', 'c00022')
182
-
GROUP BY
183
-
window_start, cid
184
-
EMIT ONUPDATE WITH BATCH 2s
185
-
```
160
+
### `EMIT ON UPDATE WITH BATCH`
186
161
187
-
### EMIT AFTER KEY EXPIRE IDENTIFIED BY .. WITH MAXSPAN .. AND TIMEOUT .. {#emit_after_key_expire}
162
+
Combines **periodic emission** with **update-based** triggers.
163
+
Timeplus checks the intermediate aggregation results at regular intervals and emits them if they have changed which can significally improve the emit efficiency and throughput compared with `EMIT ON UPDATE`.
188
164
189
-
The syntax is:
190
165
```sql
191
-
EMIT AFTER KEY EXPIRE [IDENTIFIED BY <col>] WITH [ONLY] MAXSPAN <internal> [AND TIMEOUT <internal>]
166
+
SELECT device, count(*)
167
+
FROM device_utils
168
+
WHERE cpu_usage >99
169
+
GROUP BY device
170
+
EMIT ONUPDATE WITH BATCH 1s;
192
171
```
193
172
194
-
Note:
195
-
*`EMIT AFTER KEY EXPIRE` will emit results when the keys are expired. This EMIT policy ought to be applied to a global aggregation with a primary key as `GROUP BY`, usually using an ID for multiple tracing events.
196
-
*`IDENTIFIED BY col` will calculate the span of the trace, usually you can set `IDENTIFIED BY _tp_time`.
197
-
*`MAXSPAN interval` to identify whether the span of the related events over a certain interval, for example `MAXSPAN 500ms` to flag those events with same tracing ID but over 0.5 second span.
198
-
*`ONLY`: if you add this keyword, then only those events over the `MAXSPAN` will be emitted, other events less than the `MAXSPAN` will be omitted, so that you can focus on those events over the SLA.
199
-
*`AND TIMEOUT interval` to avoid waiting for late events for too long. If there is no more events with the same key (e.g. tracing ID) after this interval, Timeplus will close the session for the key and emit results.
173
+
This query checks for changes every second and emits results only when updates occur.
200
174
201
-
It's required to use `SETTINGS default_hash_table='hybrid'` with this emit policy to avoid putting too much data in memory.
175
+
### `EMIT AFTER KEY EXPIRE`
202
176
203
-
Here is an example to get the log streams and only show the events with over 0.5 second as the end-to-end latency.
204
-
```sql
205
-
WITH grouped AS(
206
-
SELECT
207
-
trace_id,
208
-
min(start_time) AS start_ts,
209
-
max(end_time) AS end_ts,
210
-
date_diff('ms', start_ts, end_ts) AS span_ms,
211
-
group_array(json_encode(span_id, parent_span_id, name, start_time, end_time, attributes)) AS trace_events
212
-
FROM otel_traces
213
-
GROUP BY trace_id
214
-
EMIT AFTER KEY EXPIRE IDENTIFIED BY end_time WITH MAXSPAN 500ms AND TIMEOUT 2s
215
-
)
216
-
SELECT json_encode(trace_id, start_ts, end_ts, span_ms, trace_events) AS event FROM grouped
Designed for **OpenTelemetry trace analysis** and other similar use cases where you need to track **key lifetimes** across high-cardinality datasets (e.g., trace spans).
219
178
220
-
### EMIT PER EVENT
221
-
This emit policy allows you to emit results for every event in the stream, which can be useful for debugging or monitoring purposes.
179
+
This policy emits aggregation results once a key is considered **expired**.
222
180
223
-
For example, if you create a random stream `market_data` and run:
224
-
```sql
225
-
selectcount() from market_data
226
-
```
227
-
You will get the count of all events in the stream, every 2 seconds by default. Such as 10, 20, 30, etc.
181
+
**Syntax**:
228
182
229
-
If you want to emit results for every event, you can use:
230
183
```sql
231
-
selectcount() from market_data emit per event
184
+
EMIT AFTER KEY EXPIRE [IDENTIFIED BY <col>] WITH [ONLY] MAXSPAN <interval> [AND TIMEOUT <interval>]
232
185
```
233
-
You will get the count of all events in the stream, every time a new event is added to the stream. Such as 1, 2, 3, 4, 5, etc.
234
186
235
-
This new emit policy is useful for specific use cases where you want to see the results of your query for every event in the stream. It can be particularly useful for debugging or monitoring purposes, as it allows you to see the results of your query in real-time as new events are added to the stream.
187
+
**Parameters**:
188
+
*`EMIT AFTER KEY EXPIRE` - enables per-key lifetime tracking.
189
+
*`IDENTIFIED BY <col>` - column used to compute span duration (defaults to **_tp_time** if omitted).
190
+
*`MAXSPAN <interval>` - maximum allowed span before emission.
191
+
*`ONLY` - emit results only if span exceeds MAXSPAN.
192
+
*`TIMEOUT <interval>` - forces emission after inactivity to avoid waiting indefinitely.
236
193
237
-
For high throughput streams, you may want to use this emit policy with caution, as it can generate a lot of output and may impact the performance of your query.
194
+
:::info
195
+
Currently must be used with `SETTINGS default_hash_table='hybrid'` to prevent excessive memory usage.
196
+
:::
238
197
239
-
There are some limitations for this emit policy:
198
+
**Example**:
240
199
241
-
It does not support parallel processing, so it may not be suitable for high throughput streams. If there are multiple partitions for the Kafka external stream or multiple shards for the Timeplus stream, this emit policy will not work.
242
-
243
-
One workaround is to use `SHUFFLE BY` to shuffle the data into one partition or shard, but this may impact the performance of your query. For example, you can use:
244
200
```sql
245
-
select type, count() from github_events shuffle by type group by type emit per event;
201
+
WITH grouped AS
202
+
(
203
+
SELECT
204
+
trace_id,
205
+
min(start_time) AS start_ts,
206
+
max(end_time) AS end_ts,
207
+
date_diff('ms', start_ts, end_ts) AS span_ms,
208
+
group_array(json_encode(span_id, parent_span_id, name, start_time, end_time, attributes)) AS trace_events
209
+
FROM otel_traces
210
+
SHUFFLE BY trace_id
211
+
GROUP BY trace_id
212
+
EMIT AFTER KEY EXPIRE IDENTIFIED BY end_time WITH ONLY MAXSPAN 500ms AND TIMEOUT 2s
213
+
)
214
+
SELECT json_encode(trace_id, start_ts, end_ts, span_ms, trace_events) AS event
215
+
FROM grouped
216
+
SETTINGS
217
+
default_hash_table='hybrid',
218
+
max_hot_keys=1000000;
246
219
```
247
220
248
-
The other possible workaround if the stream's sharding expression is based on id, for example:
0 commit comments