Skip to content

Commit 4becf2e

Browse files
committed
Implement performance event group management and reading enhancements
- Introduced functions to manage performance event groups, including detection of maximum events and validation of static groups. - Added support for new performance read functions: `read_raw`, `read_details`, and `read_group`, along with their corresponding structures and handling in the code generation. - Enhanced the type checker to validate performance event group attachments and ensure no cycles exist in group leader relationships. - Updated userspace code generation to track usage of new performance read functions and manage group attachments. - Added tests for new functionality, including validation of oversized static performance event groups and code generation for new read functions.
1 parent 4649696 commit 4becf2e

9 files changed

Lines changed: 882 additions & 96 deletions

File tree

BUILTINS.md

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ fn main() -> i32 {
9898
- `flags`: Attachment flags (context-dependent)
9999
- Perf event form:
100100
- `handle`: Program handle returned from `load()`
101-
- `opts`: `perf_options` value — only `perf_type` and `perf_config` are required; all other fields have defaults, including `group_fd=-1`
101+
- `opts`: `perf_options` value — only `perf_type` and `perf_config` are required; all other fields have defaults, including no group (`group` invalid and `group_fd=-1`)
102102
- `flags`: Must be `0` for perf attaches; nonzero values are rejected
103103

104104
**Return Value:**
@@ -126,12 +126,14 @@ var cache = attach(perf_prog, perf_options { perf_type: perf_type_hardware, perf
126126
var branch = attach(perf_prog, perf_options {
127127
perf_type: perf_type_hardware,
128128
perf_config: branch_misses,
129-
group_fd: cache.perf_fd,
129+
group: cache,
130130
}, 0)
131131
detach(branch)
132132
detach(cache)
133133
```
134134

135+
Grouped events are scheduled as one atomic PMU unit. Separate events and separate groups may be multiplexed, but members inside one group cannot be independently multiplexed. Static groups that exceed the target PMU counter limit are rejected at compile time; override the detected/default limit with `KERNELSCRIPT_PERF_GROUP_MAX_EVENTS` when compiling for a different target.
136+
135137
**Context-specific implementations:**
136138
- **eBPF:** Not available
137139
- **Userspace:** Uses `attach_bpf_program_by_fd` for standard targets and `ks_attach_perf_event` for perf events
@@ -183,7 +185,50 @@ detach(prog) // Clean up
183185
- Returns a scaled value when `time_running < time_enabled`
184186
- Returns `-1` on invalid/stale attachment or read failure
185187
- Reads use the attachment's `perf_fd` directly; the internal token detects copied handles used after detach.
186-
- Group snapshot reads are not supported yet; read grouped attachments individually.
188+
- Use `read_group(leader)` when you need a same-time group snapshot.
189+
190+
---
191+
192+
#### `read_raw(handle)`
193+
**Signature:** `read_raw(handle: PerfAttachment) -> i64`
194+
**Variadic:** No
195+
**Context:** Userspace only
196+
197+
**Description:** Read the unscaled raw hardware/software counter value from a perf attachment.
198+
199+
**Return Value:**
200+
- Returns the raw counter value
201+
- Returns `-1` on invalid/stale attachment or read failure
202+
203+
---
204+
205+
#### `read_details(handle)`
206+
**Signature:** `read_details(handle: PerfAttachment) -> PerfReadDetails`
207+
**Variadic:** No
208+
**Context:** Userspace only
209+
210+
**Description:** Read raw, scaled, `time_enabled`, and `time_running` details for a perf attachment.
211+
212+
**Return Value:**
213+
- `raw`: unscaled counter value
214+
- `scaled`: multiplex-corrected value, or `-1` on timing/read error
215+
- `time_enabled`: perf enabled time
216+
- `time_running`: perf running time
217+
218+
---
219+
220+
#### `read_group(leader)`
221+
**Signature:** `read_group(leader: PerfAttachment) -> PerfGroupRead`
222+
**Variadic:** No
223+
**Context:** Userspace only
224+
225+
**Description:** Read a same-time snapshot from a perf event group leader. This enables `PERF_FORMAT_GROUP | PERF_FORMAT_ID` in generated perf events.
226+
227+
**Return Value:**
228+
- `count`: number of entries returned, capped at 16
229+
- `values`: multiplex-scaled values from the snapshot
230+
- `ids`: perf event IDs for the returned values
231+
- `time_enabled` / `time_running`: timing fields used for scaling
187232

188233
---
189234

README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -306,7 +306,7 @@ fn on_branch_miss(ctx: *bpf_perf_event_data) -> i32 {
306306
fn main() -> i32 {
307307
var prog = load(on_branch_miss)
308308
309-
// Minimal form — defaults: pid=-1 (all procs), cpu=0, group_fd=-1,
309+
// Minimal form — defaults: pid=-1 (all procs), cpu=0, no group,
310310
// period=1_000_000, wakeup=1; perf attach flags must be 0
311311
var att = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses }, 0)
312312
var count = read(att)
@@ -318,18 +318,21 @@ fn main() -> i32 {
318318
}
319319
```
320320

321-
Perf events can share a kernel scheduling group by passing the leader attachment's `perf_fd` as `group_fd`:
321+
Perf events can share a kernel scheduling group by passing the leader attachment directly with `group`.
322+
The lower-level `group_fd: cache.perf_fd` form is still supported for compatibility:
322323

323324
```kernelscript
324325
var cache = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: cache_misses }, 0)
325326
var branch = attach(prog, perf_options {
326327
perf_type: perf_type_hardware,
327328
perf_config: branch_misses,
328-
group_fd: cache.perf_fd,
329+
group: cache,
329330
}, 0)
330331
```
331332

332-
Adding a member restarts the whole group from zero. Detach members before detaching their leader. `read(att)` still reads one attachment at a time; it returns a multiplex-scaled count when the kernel reports `time_running < time_enabled`. Group snapshot reads are not part of this first-stage API.
333+
Adding a member restarts the whole group from zero. Detaching a leader cascades to any live members. A group competes for PMU counters as one atomic unit: different groups can be multiplexed over time, but members inside one group are not independently multiplexed. For statically visible groups, the compiler rejects groups that need more PMU counter slots than the target limit. The limit is read from known sysfs PMU caps when available, defaults to 4, and can be overridden with `KERNELSCRIPT_PERF_GROUP_MAX_EVENTS`.
334+
335+
`read(att)` returns a multiplex-scaled count when the kernel reports `time_running < time_enabled`. Use `read_raw(att)` for the raw value, `read_details(att)` for raw/scaled/timing details, and `read_group(leader)` for a same-time group snapshot.
333336

334337
**Available `perf_type` values:**
335338

SPEC.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -461,7 +461,7 @@ fn main() -> i32 {
461461
var prog = load(my_handler)
462462
463463
// Only perf_type + perf_config are required; all other fields use language-level defaults:
464-
// pid=-1, cpu=0, group_fd=-1, period=1_000_000, wakeup=1, inherit/exclude_*=false
464+
// pid=-1, cpu=0, no group, period=1_000_000, wakeup=1, inherit/exclude_*=false
465465
var misses = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses }, 0)
466466
467467
// Override specific fields as needed:
@@ -474,14 +474,16 @@ fn main() -> i32 {
474474
}, 0)
475475
476476
// Put branch misses in cache's perf event group. Adding a member restarts
477-
// the whole group from zero.
477+
// the whole group from zero. The lower-level group_fd: cache.perf_fd form
478+
// is still accepted.
478479
var branch = attach(prog, perf_options {
479480
perf_type: perf_type_hardware,
480481
perf_config: branch_misses,
481-
group_fd: cache.perf_fd,
482+
group: cache,
482483
}, 0)
483484
484485
print("misses=%lld cache=%lld branch=%lld", read(misses), read(cache), read(branch))
486+
var snapshot = read_group(cache)
485487
486488
detach(branch)
487489
detach(cache) // IOC_DISABLE → bpf_link__destroy → close(perf_fd)
@@ -500,6 +502,7 @@ fn main() -> i32 {
500502
| `pid` | `i32` | `-1` | -1 = all processes; ≥0 = specific PID |
501503
| `cpu` | `i32` | `0` | ≥0 = specific CPU; -1 = any CPU (pid must be ≥0) |
502504
| `group_fd` | `i32` | `-1` | -1 = standalone event; ≥0 = perf group leader fd |
505+
| `group` | `PerfAttachment` | invalid attachment | Preferred high-level group leader attachment |
503506
| `period` | `u64` | `1000000` | Sample after this many events |
504507
| `wakeup` | `u32` | `1` | Wake userspace after N samples |
505508
| `inherit` | `bool` | `false` | Inherit to forked children |
@@ -550,6 +553,9 @@ For event families with a richer config space, such as `perf_type_hw_cache`, pro
550553
| `ks_attach_perf_event` | `PerfAttachment (int prog_fd, ks_perf_options, int flags)` | Full open-reset-attach-enable lifecycle |
551554
| `ks_read_perf_count` | `int64_t (int perf_fd)` | Reads current counter and applies multiplex scaling when needed |
552555
| `ks_perf_attachment_read` | `int64_t (PerfAttachment)` | Direct fd read through the attachment value with stale-handle detection |
556+
| `ks_perf_attachment_read_raw` | `int64_t (PerfAttachment)` | Direct raw counter read with stale-handle detection |
557+
| `ks_perf_attachment_read_details` | `PerfReadDetails (PerfAttachment)` | Returns raw, scaled, `time_enabled`, and `time_running` |
558+
| `ks_perf_attachment_read_group` | `PerfGroupRead (PerfAttachment)` | Reads a same-time group snapshot from a leader attachment |
553559

554560
**Attach sequence for standalone events (compiler-generated, inside `ks_attach_perf_event`):**
555561
1. `ks_attr.attr.disabled = 1` — open counter without starting it
@@ -559,17 +565,24 @@ For event families with a richer config space, such as `perf_type_hw_cache`, pro
559565
5. `ioctl(perf_fd, PERF_EVENT_IOC_ENABLE, 0)`**start counting**
560566

561567
**Perf event groups:**
568+
- `group: leader_attachment` is the preferred way to join a perf group.
562569
- `group_fd >= 0` opens the new event as a member of that leader fd.
563570
- Group members are opened disabled, linked to the BPF program, then the leader is disabled, reset, and enabled with `PERF_IOC_FLAG_GROUP`.
564571
- Adding a member to an already running group restarts the whole group from zero.
565-
- Detaching a member is allowed. Detaching a leader while live members reference it is rejected; detach members first.
566-
- Group snapshot reads are not implemented yet; read each `PerfAttachment` separately.
572+
- A group is scheduled as an atomic PMU unit. Separate events and separate groups may be multiplexed; members inside one group are not independently multiplexed. If a statically visible group needs more PMU counter slots than the target limit, compilation fails.
573+
- The compile-time group limit uses known sysfs PMU caps when available, falls back to `4`, and can be overridden with `KERNELSCRIPT_PERF_GROUP_MAX_EVENTS`.
574+
- `perf_type_software` and `perf_type_tracepoint` do not consume PMU counter slots for this check; static hardware/raw/cache/breakpoint events consume one slot, and dynamic `perf_type` values are conservatively counted as one slot.
575+
- Detaching a member is allowed. Detaching a leader cascades to any live members.
576+
- `read_group(leader)` enables `PERF_FORMAT_GROUP | PERF_FORMAT_ID` and returns up to 16 same-time group values plus perf IDs and timing fields.
567577

568578
**Counter reads:**
569579
- Generated perf events request `PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING`.
570580
- `read(att)` returns the raw value when `time_enabled == time_running`.
571581
- If multiplexing occurred, `read(att)` returns `value * time_enabled / time_running` using a 128-bit intermediate.
572582
- If `time_running == 0`, `read(att)` reports an error and returns `-1`.
583+
- `read_raw(att)` returns the unscaled raw counter.
584+
- `read_details(att)` returns raw, scaled, `time_enabled`, and `time_running`.
585+
- `read_group(leader)` returns a snapshot struct; group `values[]` are scaled using the snapshot timing fields.
573586

574587
**Detach sequence (compiler-generated):**
575588
1. `ioctl(perf_fd, PERF_EVENT_IOC_DISABLE, 0)` — stop counting
@@ -583,6 +596,7 @@ For event families with a richer config space, such as `perf_type_hw_cache`, pro
583596
- `PerfAttachment` carries `perf_fd` plus an internal generation token; `read(attachment)` avoids global attachment-list scans and rejects copied handles after detach
584597
- Exposes omitted `perf_options` fields as language-level defaults (partial struct literal)
585598
- Validates `pid ≥ -1`, `cpu ≥ -1`, `group_fd ≥ -1`, and rejects `pid == -1 && cpu == -1` at runtime
599+
- Treats `group` as valid only when it carries a live `PerfAttachment` generation token; otherwise `group_fd` controls grouping
586600
- Emits `PERF_FLAG_FD_CLOEXEC` for safe fd inheritance
587601
- BPF program section is `SEC("perf_event")`
588602

examples/perf_cache_miss.ks

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,17 +11,19 @@ fn on_cache_miss(ctx: *bpf_perf_event_data) -> i32 {
1111
fn main() -> i32 {
1212
var prog = load(on_cache_miss)
1313

14-
// Only perf_type + perf_config are required; pid, cpu, group_fd, period, wakeup and flag fields
14+
// Only perf_type + perf_config are required; pid, cpu, group/group_fd, period, wakeup and flag fields
1515
// default to: pid=-1 (all procs), cpu=0, period=1_000_000, wakeup=1,
16-
// group_fd=-1, inherit/exclude_kernel/exclude_user=false.
16+
// no group, inherit/exclude_kernel/exclude_user=false.
1717
var cache = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: cache_misses, period: 10000000, inherit: true }, 0)
1818
// branch joins cache's perf event group. Adding a member restarts the whole group from zero.
19-
var branch = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses, period: 10000000, inherit: true, group_fd: cache.perf_fd }, 0)
19+
var branch = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses, period: 10000000, inherit: true, group: cache }, 0)
2020
print("Cache-miss and branch-miss perf_event demo attached")
2121
var cache_count = read(cache)
2222
print("Cache-miss count: %lld", cache_count)
2323
var branch_count = read(branch)
2424
print("Branch-miss count: %lld", branch_count)
25+
var snapshot = read_group(cache)
26+
print("Grouped snapshot entries: %u", snapshot.count)
2527

2628
detach(branch)
2729
detach(cache)

src/stdlib.ml

Lines changed: 71 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,13 @@ let validate_read_function arg_types _ast_context _pos =
129129
| _ ->
130130
(false, Some "read() currently requires a PerfAttachment")
131131

132+
let validate_read_group_function arg_types _ast_context _pos =
133+
match arg_types with
134+
| [Struct "PerfAttachment"] | [UserType "PerfAttachment"] ->
135+
(true, None)
136+
| _ ->
137+
(false, Some "read_group() requires a PerfAttachment group leader")
138+
132139
(** Validation function for detach() - accepts program handles and perf attachments *)
133140
let validate_detach_function arg_types _ast_context _pos =
134141
match arg_types with
@@ -244,13 +251,46 @@ let builtin_functions = [
244251
name = "read";
245252
param_types = []; (* Custom validation handles attachment-aware overloads *)
246253
return_type = I64; (* Raw counter value, or -1 on error *)
247-
description = "Read the current hardware/software counter value for a perf attachment";
254+
description = "Read the multiplex-scaled hardware/software counter value for a perf attachment";
248255
is_variadic = false;
249256
ebpf_impl = ""; (* Not available in eBPF context *)
250257
userspace_impl = "ks_perf_attachment_read";
251258
kernel_impl = "";
252259
validate = Some validate_read_function;
253260
};
261+
{
262+
name = "read_raw";
263+
param_types = [];
264+
return_type = I64;
265+
description = "Read the raw hardware/software counter value for a perf attachment";
266+
is_variadic = false;
267+
ebpf_impl = "";
268+
userspace_impl = "ks_perf_attachment_read_raw";
269+
kernel_impl = "";
270+
validate = Some validate_read_function;
271+
};
272+
{
273+
name = "read_details";
274+
param_types = [];
275+
return_type = Struct "PerfReadDetails";
276+
description = "Read raw, scaled, time_enabled, and time_running for a perf attachment";
277+
is_variadic = false;
278+
ebpf_impl = "";
279+
userspace_impl = "ks_perf_attachment_read_details";
280+
kernel_impl = "";
281+
validate = Some validate_read_function;
282+
};
283+
{
284+
name = "read_group";
285+
param_types = [];
286+
return_type = Struct "PerfGroupRead";
287+
description = "Read a same-time snapshot from a perf event group leader";
288+
is_variadic = false;
289+
ebpf_impl = "";
290+
userspace_impl = "ks_perf_attachment_read_group";
291+
kernel_impl = "";
292+
validate = Some validate_read_group_function;
293+
};
254294
]
255295

256296
(** Get built-in function definition by name *)
@@ -350,6 +390,7 @@ let builtin_types = [
350390
("pid", I32);
351391
("cpu", I32);
352392
("group_fd", I32);
393+
("group", Struct "PerfAttachment");
353394
("period", U64);
354395
("wakeup", U32);
355396
("inherit", Bool);
@@ -364,6 +405,21 @@ let builtin_types = [
364405
("prog_fd", I32);
365406
("generation", U64);
366407
], builtin_pos));
408+
409+
TypeDef (StructDef ("PerfReadDetails", [
410+
("raw", I64);
411+
("scaled", I64);
412+
("time_enabled", U64);
413+
("time_running", U64);
414+
], builtin_pos));
415+
416+
TypeDef (StructDef ("PerfGroupRead", [
417+
("count", U32);
418+
("values", Array (I64, 16));
419+
("ids", Array (U64, 16));
420+
("time_enabled", U64);
421+
("time_running", U64);
422+
], builtin_pos));
367423
]
368424

369425
(** Default field values for structs that support partial initialisation.
@@ -373,14 +429,20 @@ let builtin_types = [
373429
let get_struct_field_defaults = function
374430
| "perf_options" ->
375431
Some [
376-
("pid", IntLit (Signed64 (-1L), None));
377-
("cpu", IntLit (Signed64 0L, None));
378-
("group_fd", IntLit (Signed64 (-1L), None));
379-
("period", IntLit (Unsigned64 1000000L, None));
380-
("wakeup", IntLit (Unsigned64 1L, None));
381-
("inherit", BoolLit false);
382-
("exclude_kernel", BoolLit false);
383-
("exclude_user", BoolLit false);
432+
("pid", Literal (IntLit (Signed64 (-1L), None)));
433+
("cpu", Literal (IntLit (Signed64 0L, None)));
434+
("group_fd", Literal (IntLit (Signed64 (-1L), None)));
435+
("group", StructLiteral ("PerfAttachment", [
436+
("perf_fd", make_expr (Literal (IntLit (Signed64 (-1L), None))) builtin_pos);
437+
("link_id", make_expr (Literal (IntLit (Signed64 (-1L), None))) builtin_pos);
438+
("prog_fd", make_expr (Literal (IntLit (Signed64 (-1L), None))) builtin_pos);
439+
("generation", make_expr (Literal (IntLit (Unsigned64 0L, None))) builtin_pos);
440+
]));
441+
("period", Literal (IntLit (Unsigned64 1000000L, None)));
442+
("wakeup", Literal (IntLit (Unsigned64 1L, None)));
443+
("inherit", Literal (BoolLit false));
444+
("exclude_kernel", Literal (BoolLit false));
445+
("exclude_user", Literal (BoolLit false));
384446
]
385447
| _ -> None
386448

0 commit comments

Comments
 (0)