You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add perf event groups, multiplex-scaled reads, and group snapshots
Group multiple perf counters so they are scheduled as one atomic PMU unit
and read as a consistent same-time snapshot.
Grouping
- perf_options gains a high-level `group: leader_attachment` field; the
lower-level `group_fd: leader.perf_fd` form is kept for compatibility.
- Attaching a member opens it disabled, links the BPF program, then
disable/reset/enable the whole group with PERF_IOC_FLAG_GROUP, so adding
a member restarts the group from zero.
- Detaching a group leader cascades to its live members instead of being
rejected.
Reads
- read() now returns a PerfRead struct (raw, scaled, time_enabled,
time_running, count, values[16], ids[16]) instead of a bare i64.
- Each attachment records its own perf event id via PERF_EVENT_IOC_ID at
attach time. read() selects the entry matching that id from the group
buffer, fixing member reads that previously returned the leader's count.
- scaled corrects for PMU multiplexing as value * time_enabled /
time_running using a 128-bit intermediate; it equals raw when the event
was never multiplexed. time_running == 0 is reported as an error.
Compile-time validation
- Statically visible perf groups are checked during type-checking: the
group must fit the target PMU counter limit (read from sysfs caps,
KERNELSCRIPT_PERF_GROUP_MAX_EVENTS override, or a default of 4), and the
member count is capped at the 16 entries PerfRead can hold. Software and
tracepoint events do not consume PMU slots.
Codegen fixes
- ir_generator: array element load now dereferences the element pointer
(IRUnOp(IRDeref)) instead of yielding the pointer value.
- userspace_codegen: non-literal array initializers are declared then
memcpy'd, and duplicate function-scope C declarations for a for-loop
counter reused by a later variable of the same name are avoided.
- parser: call results are primary expressions, so field access on a call
such as read(att).scaled parses.
Docs (README, SPEC, BUILTINS) and the perf_cache_miss / perf_page_fault
examples are updated; tests cover the group_fd and high-level group paths,
group restart/cascade, multiplex scaling, oversized-group rejection, and
the for-counter reuse regression.
Copy file name to clipboardExpand all lines: BUILTINS.md
+23-6Lines changed: 23 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -98,7 +98,7 @@ fn main() -> i32 {
98
98
-`flags`: Attachment flags (context-dependent)
99
99
- Perf event form:
100
100
-`handle`: Program handle returned from `load()`
101
-
-`opts`: `perf_options` value — only `perf_type` and `perf_config` are required; all other fields have defaults
101
+
-`opts`: `perf_options` value — only `perf_type` and `perf_config` are required; all other fields have defaults, including no group (`group` invalid and `group_fd=-1`)
102
102
-`flags`: Must be `0` for perf attaches; nonzero values are rejected
103
103
104
104
**Return Value:**
@@ -117,11 +117,23 @@ if (result != 0) {
117
117
// pid=-1 (all procs), cpu=0, period=1_000_000, wakeup=1; perf attach flags must be 0
Grouped events are scheduled as one atomic PMU unit. Separate events and separate groups may be multiplexed, but members inside one group cannot be independently multiplexed. Static groups that exceed the target PMU counter limit are rejected at compile time; override the detected/default limit with `KERNELSCRIPT_PERF_GROUP_MAX_EVENTS` when compiling for a different target. The effective limit is capped at 16 to match `PerfRead`.
136
+
125
137
**Context-specific implementations:**
126
138
-**eBPF:** Not available
127
139
-**Userspace:** Uses `attach_bpf_program_by_fd` for standard targets and `ks_attach_perf_event` for perf events
**Description:** Read the current hardware/software counter value from a perf attachment.
178
+
**Description:** Read a perf attachment snapshot. The result includes this attachment's raw and scaled count, multiplex timing, and same-time group arrays.
167
179
168
180
**Parameters:**
169
181
-`handle`: Perf attachment returned from `attach(handle, perf_options, flags)`
170
182
171
183
**Return Value:**
172
-
- Returns the raw 64-bit counter value on success
173
-
- Returns `-1` on invalid/stale attachment or read failure
184
+
-`raw`: this event's unscaled counter value, or `-1` on invalid/stale attachment or read failure
185
+
-`scaled`: this event's multiplex-corrected value, or `-1` on timing/read error
186
+
-`time_enabled`: perf enabled time
187
+
-`time_running`: perf running time
188
+
-`count`: number of group entries returned; `1` for a standalone event
189
+
-`values`: multiplex-scaled group values, capped at 16; `values[0] == scaled`
190
+
-`ids`: perf event IDs for the returned values
174
191
- Reads use the attachment's `perf_fd` directly; the internal token detects copied handles used after detach.
Copy file name to clipboardExpand all lines: README.md
+19-3Lines changed: 19 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -294,7 +294,7 @@ fn main() -> i32 {
294
294
295
295
### Hardware Performance Counter Programs
296
296
297
-
Use `@perf_event` to attach eBPF programs to hardware or software performance counters. `perf_options` keeps the kernel's tagged `perf_type + perf_config` model, so adding new perf event families does not require flattening everything into one enum. Only `perf_type` and `perf_config` are required; all other fields have sensible defaults. Perf attaches return a first-class attachment value, so if you need the current count in userspace, call `read(att)`:
297
+
Use `@perf_event` to attach eBPF programs to hardware or software performance counters. `perf_options` keeps the kernel's tagged `perf_type + perf_config` model, so adding new perf event families does not require flattening everything into one enum. Only `perf_type` and `perf_config` are required; all other fields have sensible defaults. Perf attaches return a first-class attachment value, so if you need the current count in userspace, call `read(att).scaled`:
298
298
299
299
```kernelscript
300
300
// eBPF program fires on every hardware branch-miss sample
Adding a member restarts the whole group from zero. Detaching a leader cascades to any live members. A group competes for PMU counters as one atomic unit: different groups can be multiplexed over time, but members inside one group are not independently multiplexed. For statically visible groups, the compiler rejects groups that need more PMU counter slots than the target limit. The limit is read from known sysfs PMU caps when available, defaults to 4, can be overridden with `KERNELSCRIPT_PERF_GROUP_MAX_EVENTS`, and is capped at 16 to match `PerfRead`.
334
+
335
+
`read(att)` returns a `PerfRead` snapshot with raw, multiplex-scaled, timing, and group fields. Use `read(att).scaled` for that attachment's counter value, `read(att).raw` for its unscaled value, and `read(att).values` / `read(att).ids` for a same-time group snapshot.
-`group: leader_attachment` is the preferred way to join a perf group.
565
+
-`group_fd >= 0` opens the new event as a member of that leader fd.
566
+
- Group members are opened disabled, linked to the BPF program, then the leader is disabled, reset, and enabled with `PERF_IOC_FLAG_GROUP`.
567
+
- Adding a member to an already running group restarts the whole group from zero.
568
+
- A group is scheduled as an atomic PMU unit. Separate events and separate groups may be multiplexed; members inside one group are not independently multiplexed. If a statically visible group needs more PMU counter slots than the target limit, compilation fails.
569
+
- The compile-time group limit uses known sysfs PMU caps when available, falls back to `4`, can be overridden with `KERNELSCRIPT_PERF_GROUP_MAX_EVENTS`, and is capped at the 16 entries exposed by `PerfRead`.
570
+
-`perf_type_software` and `perf_type_tracepoint` do not consume PMU counter slots for this check; static hardware/raw/cache/breakpoint events consume one slot, and dynamic `perf_type` values are conservatively counted as one slot.
571
+
- Detaching a member is allowed. Detaching a leader cascades to any live members.
572
+
- Generated perf events always enable `PERF_FORMAT_GROUP | PERF_FORMAT_ID`, and `read(att)` returns up to 16 same-time group values plus perf IDs and timing fields. `raw` and `scaled` select the entry matching the attachment being read.
@@ -559,7 +591,8 @@ For event families with a richer config space, such as `perf_type_hw_cache`, pro
559
591
- Returns a first-class `PerfAttachment` value for perf attaches so one program can hold multiple live counters
560
592
-`PerfAttachment` carries `perf_fd` plus an internal generation token; `read(attachment)` avoids global attachment-list scans and rejects copied handles after detach
0 commit comments