Skip to content

Commit 6f9e6e8

Browse files
Add perf event groups, multiplex-scaled reads, and group snapshots
Group multiple perf counters so they are scheduled as one atomic PMU unit and read as a consistent same-time snapshot. Grouping - perf_options gains a high-level `group: leader_attachment` field; the lower-level `group_fd: leader.perf_fd` form is kept for compatibility. - Attaching a member opens it disabled, links the BPF program, then disable/reset/enable the whole group with PERF_IOC_FLAG_GROUP, so adding a member restarts the group from zero. - Detaching a group leader cascades to its live members instead of being rejected. Reads - read() now returns a PerfRead struct (raw, scaled, time_enabled, time_running, count, values[16], ids[16]) instead of a bare i64. - Each attachment records its own perf event id via PERF_EVENT_IOC_ID at attach time. read() selects the entry matching that id from the group buffer, fixing member reads that previously returned the leader's count. - scaled corrects for PMU multiplexing as value * time_enabled / time_running using a 128-bit intermediate; it equals raw when the event was never multiplexed. time_running == 0 is reported as an error. Compile-time validation - Statically visible perf groups are checked during type-checking: the group must fit the target PMU counter limit (read from sysfs caps, KERNELSCRIPT_PERF_GROUP_MAX_EVENTS override, or a default of 4), and the member count is capped at the 16 entries PerfRead can hold. Software and tracepoint events do not consume PMU slots. Codegen fixes - ir_generator: array element load now dereferences the element pointer (IRUnOp(IRDeref)) instead of yielding the pointer value. - userspace_codegen: non-literal array initializers are declared then memcpy'd, and duplicate function-scope C declarations for a for-loop counter reused by a later variable of the same name are avoided. - parser: call results are primary expressions, so field access on a call such as read(att).scaled parses. Docs (README, SPEC, BUILTINS) and the perf_cache_miss / perf_page_fault examples are updated; tests cover the group_fd and high-level group paths, group restart/cascade, multiplex scaling, oversized-group rejection, and the for-counter reuse regression.
1 parent 888f9aa commit 6f9e6e8

15 files changed

Lines changed: 1249 additions & 119 deletions

BUILTINS.md

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ fn main() -> i32 {
9898
- `flags`: Attachment flags (context-dependent)
9999
- Perf event form:
100100
- `handle`: Program handle returned from `load()`
101-
- `opts`: `perf_options` value — only `perf_type` and `perf_config` are required; all other fields have defaults
101+
- `opts`: `perf_options` value — only `perf_type` and `perf_config` are required; all other fields have defaults, including no group (`group` invalid and `group_fd=-1`)
102102
- `flags`: Must be `0` for perf attaches; nonzero values are rejected
103103

104104
**Return Value:**
@@ -117,11 +117,23 @@ if (result != 0) {
117117
// pid=-1 (all procs), cpu=0, period=1_000_000, wakeup=1; perf attach flags must be 0
118118
var perf_prog = load(on_branch_miss)
119119
var perf_att = attach(perf_prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses }, 0)
120-
var count = read(perf_att)
120+
var count = read(perf_att).scaled
121121
detach(perf_att)
122122
detach(perf_prog)
123+
124+
// Grouped perf events: branch joins cache's leader group. Adding a member restarts the group.
125+
var cache = attach(perf_prog, perf_options { perf_type: perf_type_hardware, perf_config: cache_misses }, 0)
126+
var branch = attach(perf_prog, perf_options {
127+
perf_type: perf_type_hardware,
128+
perf_config: branch_misses,
129+
group: cache,
130+
}, 0)
131+
detach(branch)
132+
detach(cache)
123133
```
124134

135+
Grouped events are scheduled as one atomic PMU unit. Separate events and separate groups may be multiplexed, but members inside one group cannot be independently multiplexed. Static groups that exceed the target PMU counter limit are rejected at compile time; override the detected/default limit with `KERNELSCRIPT_PERF_GROUP_MAX_EVENTS` when compiling for a different target. The effective limit is capped at 16 to match `PerfRead`.
136+
125137
**Context-specific implementations:**
126138
- **eBPF:** Not available
127139
- **Userspace:** Uses `attach_bpf_program_by_fd` for standard targets and `ks_attach_perf_event` for perf events
@@ -159,18 +171,23 @@ detach(prog) // Clean up
159171
---
160172

161173
#### `read(handle)`
162-
**Signature:** `read(handle: PerfAttachment) -> i64`
174+
**Signature:** `read(handle: PerfAttachment) -> PerfRead`
163175
**Variadic:** No
164176
**Context:** Userspace only
165177

166-
**Description:** Read the current hardware/software counter value from a perf attachment.
178+
**Description:** Read a perf attachment snapshot. The result includes this attachment's raw and scaled count, multiplex timing, and same-time group arrays.
167179

168180
**Parameters:**
169181
- `handle`: Perf attachment returned from `attach(handle, perf_options, flags)`
170182

171183
**Return Value:**
172-
- Returns the raw 64-bit counter value on success
173-
- Returns `-1` on invalid/stale attachment or read failure
184+
- `raw`: this event's unscaled counter value, or `-1` on invalid/stale attachment or read failure
185+
- `scaled`: this event's multiplex-corrected value, or `-1` on timing/read error
186+
- `time_enabled`: perf enabled time
187+
- `time_running`: perf running time
188+
- `count`: number of group entries returned; `1` for a standalone event
189+
- `values`: multiplex-scaled group values, capped at 16; `values[0] == scaled`
190+
- `ids`: perf event IDs for the returned values
174191
- Reads use the attachment's `perf_fd` directly; the internal token detects copied handles used after detach.
175192

176193
---

README.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,7 @@ fn main() -> i32 {
294294

295295
### Hardware Performance Counter Programs
296296

297-
Use `@perf_event` to attach eBPF programs to hardware or software performance counters. `perf_options` keeps the kernel's tagged `perf_type + perf_config` model, so adding new perf event families does not require flattening everything into one enum. Only `perf_type` and `perf_config` are required; all other fields have sensible defaults. Perf attaches return a first-class attachment value, so if you need the current count in userspace, call `read(att)`:
297+
Use `@perf_event` to attach eBPF programs to hardware or software performance counters. `perf_options` keeps the kernel's tagged `perf_type + perf_config` model, so adding new perf event families does not require flattening everything into one enum. Only `perf_type` and `perf_config` are required; all other fields have sensible defaults. Perf attaches return a first-class attachment value, so if you need the current count in userspace, call `read(att).scaled`:
298298

299299
```kernelscript
300300
// eBPF program fires on every hardware branch-miss sample
@@ -306,10 +306,10 @@ fn on_branch_miss(ctx: *bpf_perf_event_data) -> i32 {
306306
fn main() -> i32 {
307307
var prog = load(on_branch_miss)
308308
309-
// Minimal form — defaults: pid=-1 (all procs), cpu=0,
309+
// Minimal form — defaults: pid=-1 (all procs), cpu=0, no group,
310310
// period=1_000_000, wakeup=1; perf attach flags must be 0
311311
var att = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses }, 0)
312-
var count = read(att)
312+
var count = read(att).scaled
313313
print("branch misses: %lld", count)
314314
315315
detach(att) // disables counter, destroys BPF link, closes fd
@@ -318,6 +318,22 @@ fn main() -> i32 {
318318
}
319319
```
320320

321+
Perf events can share a kernel scheduling group by passing the leader attachment directly with `group`.
322+
The lower-level `group_fd: cache.perf_fd` form is still supported for compatibility:
323+
324+
```kernelscript
325+
var cache = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: cache_misses }, 0)
326+
var branch = attach(prog, perf_options {
327+
perf_type: perf_type_hardware,
328+
perf_config: branch_misses,
329+
group: cache,
330+
}, 0)
331+
```
332+
333+
Adding a member restarts the whole group from zero. Detaching a leader cascades to any live members. A group competes for PMU counters as one atomic unit: different groups can be multiplexed over time, but members inside one group are not independently multiplexed. For statically visible groups, the compiler rejects groups that need more PMU counter slots than the target limit. The limit is read from known sysfs PMU caps when available, defaults to 4, can be overridden with `KERNELSCRIPT_PERF_GROUP_MAX_EVENTS`, and is capped at 16 to match `PerfRead`.
334+
335+
`read(att)` returns a `PerfRead` snapshot with raw, multiplex-scaled, timing, and group fields. Use `read(att).scaled` for that attachment's counter value, `read(att).raw` for its unscaled value, and `read(att).values` / `read(att).ids` for a same-time group snapshot.
336+
321337
**Available `perf_type` values:**
322338

323339
| Enum value | Hardware/software event |

SPEC.md

Lines changed: 40 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -461,7 +461,7 @@ fn main() -> i32 {
461461
var prog = load(my_handler)
462462
463463
// Only perf_type + perf_config are required; all other fields use language-level defaults:
464-
// pid=-1, cpu=0, period=1_000_000, wakeup=1, inherit/exclude_*=false
464+
// pid=-1, cpu=0, no group, period=1_000_000, wakeup=1, inherit/exclude_*=false
465465
var misses = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses }, 0)
466466
467467
// Override specific fields as needed:
@@ -473,8 +473,19 @@ fn main() -> i32 {
473473
exclude_kernel: true,
474474
}, 0)
475475
476-
print("misses=%lld cache=%lld", read(misses), read(cache))
476+
// Put branch misses in cache's perf event group. Adding a member restarts
477+
// the whole group from zero. The lower-level group_fd: cache.perf_fd form
478+
// is still accepted.
479+
var branch = attach(prog, perf_options {
480+
perf_type: perf_type_hardware,
481+
perf_config: branch_misses,
482+
group: cache,
483+
}, 0)
477484
485+
print("misses=%lld cache=%lld branch=%lld", read(misses).scaled, read(cache).scaled, read(branch).scaled)
486+
var snapshot = read(cache)
487+
488+
detach(branch)
478489
detach(cache) // IOC_DISABLE → bpf_link__destroy → close(perf_fd)
479490
detach(misses)
480491
detach(prog)
@@ -490,6 +501,8 @@ fn main() -> i32 {
490501
| `perf_config` | `u64` | *(required)* | `perf_event_attr.config` value for that type |
491502
| `pid` | `i32` | `-1` | -1 = all processes; ≥0 = specific PID |
492503
| `cpu` | `i32` | `0` | ≥0 = specific CPU; -1 = any CPU (pid must be ≥0) |
504+
| `group_fd` | `i32` | `-1` | -1 = standalone event; ≥0 = perf group leader fd |
505+
| `group` | `PerfAttachment` | invalid attachment | Preferred high-level group leader attachment |
493506
| `period` | `u64` | `1000000` | Sample after this many events |
494507
| `wakeup` | `u32` | `1` | Wake userspace after N samples |
495508
| `inherit` | `bool` | `false` | Inherit to forked children |
@@ -538,16 +551,35 @@ For event families with a richer config space, such as `perf_type_hw_cache`, pro
538551
|---|---|---|
539552
| `ks_open_perf_event` | `int (ks_perf_options)` | Calls `perf_event_open(2)`, returns fd |
540553
| `ks_attach_perf_event` | `PerfAttachment (int prog_fd, ks_perf_options, int flags)` | Full open-reset-attach-enable lifecycle |
541-
| `ks_read_perf_count` | `int64_t (int perf_fd)` | Reads current 64-bit counter via `read()` |
542-
| `ks_perf_attachment_read` | `int64_t (PerfAttachment)` | Direct fd read through the attachment value with stale-handle detection |
554+
| `ks_perf_attachment_read` | `PerfRead (PerfAttachment)` | Direct fd snapshot through the attachment value with stale-handle detection |
543555

544-
**Attach sequence (compiler-generated, inside `ks_attach_perf_event`):**
556+
**Attach sequence for standalone events (compiler-generated, inside `ks_attach_perf_event`):**
545557
1. `ks_attr.attr.disabled = 1` — open counter without starting it
546-
2. `syscall(SYS_perf_event_open, ...)``perf_fd`
558+
2. `syscall(SYS_perf_event_open, ..., group_fd=-1, ...)``perf_fd`
547559
3. `ioctl(perf_fd, PERF_EVENT_IOC_RESET, 0)` — zero the counter
548560
4. `bpf_program__attach_perf_event(prog, perf_fd)` — link BPF program
549561
5. `ioctl(perf_fd, PERF_EVENT_IOC_ENABLE, 0)`**start counting**
550562

563+
**Perf event groups:**
564+
- `group: leader_attachment` is the preferred way to join a perf group.
565+
- `group_fd >= 0` opens the new event as a member of that leader fd.
566+
- Group members are opened disabled, linked to the BPF program, then the leader is disabled, reset, and enabled with `PERF_IOC_FLAG_GROUP`.
567+
- Adding a member to an already running group restarts the whole group from zero.
568+
- A group is scheduled as an atomic PMU unit. Separate events and separate groups may be multiplexed; members inside one group are not independently multiplexed. If a statically visible group needs more PMU counter slots than the target limit, compilation fails.
569+
- The compile-time group limit uses known sysfs PMU caps when available, falls back to `4`, can be overridden with `KERNELSCRIPT_PERF_GROUP_MAX_EVENTS`, and is capped at the 16 entries exposed by `PerfRead`.
570+
- `perf_type_software` and `perf_type_tracepoint` do not consume PMU counter slots for this check; static hardware/raw/cache/breakpoint events consume one slot, and dynamic `perf_type` values are conservatively counted as one slot.
571+
- Detaching a member is allowed. Detaching a leader cascades to any live members.
572+
- Generated perf events always enable `PERF_FORMAT_GROUP | PERF_FORMAT_ID`, and `read(att)` returns up to 16 same-time group values plus perf IDs and timing fields. `raw` and `scaled` select the entry matching the attachment being read.
573+
574+
**Counter reads:**
575+
- Generated perf events request `PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING | PERF_FORMAT_ID | PERF_FORMAT_GROUP`.
576+
- `read(att)` returns a `PerfRead` snapshot with `raw`, `scaled`, `time_enabled`, `time_running`, `count`, `values`, and `ids`.
577+
- `read(att).scaled` equals this attachment's raw value when `time_enabled == time_running`.
578+
- If multiplexing occurred, `read(att).scaled` is `value * time_enabled / time_running` using a 128-bit intermediate.
579+
- If `time_running == 0`, `read(att)` reports an error and returns `scaled == -1`.
580+
- `read(att).raw` returns this attachment's unscaled raw counter.
581+
- `read(att).values[]` contains multiplex-scaled group values using the snapshot timing fields; `count == 1` for standalone events.
582+
551583
**Detach sequence (compiler-generated):**
552584
1. `ioctl(perf_fd, PERF_EVENT_IOC_DISABLE, 0)` — stop counting
553585
2. `bpf_link__destroy(link)` — unlink BPF program
@@ -559,7 +591,8 @@ For event families with a richer config space, such as `perf_type_hw_cache`, pro
559591
- Returns a first-class `PerfAttachment` value for perf attaches so one program can hold multiple live counters
560592
- `PerfAttachment` carries `perf_fd` plus an internal generation token; `read(attachment)` avoids global attachment-list scans and rejects copied handles after detach
561593
- Exposes omitted `perf_options` fields as language-level defaults (partial struct literal)
562-
- Validates `pid ≥ -1`, `cpu ≥ -1`, and rejects `pid == -1 && cpu == -1` at runtime
594+
- Validates `pid ≥ -1`, `cpu ≥ -1`, `group_fd ≥ -1`, and rejects `pid == -1 && cpu == -1` at runtime
595+
- Treats `group` as valid only when it carries a live `PerfAttachment` generation token; otherwise `group_fd` controls grouping
563596
- Emits `PERF_FLAG_FD_CLOEXEC` for safe fd inheritance
564597
- BPF program section is `SEC("perf_event")`
565598

examples/perf_cache_miss.ks

Lines changed: 35 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,19 +11,48 @@ fn on_cache_miss(ctx: *bpf_perf_event_data) -> i32 {
1111
fn main() -> i32 {
1212
var prog = load(on_cache_miss)
1313

14-
// Only perf_type + perf_config are required; pid, cpu, period, wakeup and flag fields
14+
// Only perf_type + perf_config are required; pid, cpu, group/group_fd, period, wakeup and flag fields
1515
// default to: pid=-1 (all procs), cpu=0, period=1_000_000, wakeup=1,
16-
// inherit/exclude_kernel/exclude_user=false.
16+
// no group, inherit/exclude_kernel/exclude_user=false.
1717
var cache = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: cache_misses, period: 10000000, inherit: true }, 0)
18-
var branch = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses, period: 10000000, inherit: true }, 0)
18+
// branch joins cache's perf event group. Adding a member restarts the whole group from zero.
19+
var branch = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses, period: 10000000, inherit: true, group: cache }, 0)
1920
print("Cache-miss and branch-miss perf_event demo attached")
20-
var cache_count = read(cache)
21+
var cache_count = read(cache).scaled
2122
print("Cache-miss count: %lld", cache_count)
22-
var branch_count = read(branch)
23+
var branch_count = read(branch).scaled
2324
print("Branch-miss count: %lld", branch_count)
25+
26+
var prev = read(cache)
27+
// Simulate workload with cache misses and branch misses.
28+
var x = 0
29+
var i = 0
30+
for (i in 0..10000000) {
31+
if (i % 100 == 0) {
32+
x = x + 1
33+
} else {
34+
x = x * 2
35+
}
36+
}
37+
var cur = read(cache)
38+
var delta = cur.scaled - prev.scaled
39+
var dt_ns = cur.time_enabled - prev.time_enabled
40+
if (dt_ns > 0) {
41+
var per_sec = (delta * 1000000000) / dt_ns
42+
print("Cache misses/sec: %lld", per_sec)
43+
}
44+
45+
var snapshot = read(cache)
46+
print("Grouped snapshot entries: %u", snapshot.count)
47+
48+
var snapshot_index = 0
49+
while (snapshot_index < snapshot.count) {
50+
print("id=%llu value=%lld", snapshot.ids[snapshot_index], snapshot.values[snapshot_index])
51+
snapshot_index = snapshot_index + 1
52+
}
2453

25-
detach(cache)
2654
detach(branch)
55+
detach(cache)
2756
detach(prog)
2857
print("Cache-miss and branch-miss perf_event demo detached")
2958
return 0

examples/perf_page_fault.ks

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,20 +14,26 @@ fn main() -> i32 {
1414
// pid: 0 = current process, cpu: -1 = any CPU (standard per-process monitoring).
1515
// page_faults (PERF_COUNT_SW_PAGE_FAULTS) is the most reliable software event:
1616
// every heap/stack allocation triggers minor page faults, no scheduler dependency.
17-
var att = attach(prog, perf_options { perf_type: perf_type_software, perf_config: page_faults, pid: 0, cpu: -1, period: 1 }, 0)
18-
print("Page-fault perf_event demo attached")
17+
var page = attach(prog, perf_options { perf_type: perf_type_software, perf_config: page_faults, pid: 0, cpu: -1, period: 1 }, 0)
18+
// branch is a standalone hardware event; page_faults remains a separate software event.
19+
var branch = attach(prog, perf_options { perf_type: perf_type_hardware, perf_config: branch_misses, period: 10000000, inherit: true}, 0)
20+
21+
print("perf_event demo attached")
1922

2023
// Repeatedly increment a counter; stack/heap activity will generate page faults.
2124
var x: i64 = 0
2225
for (i in 0..10000000) {
2326
x = x + 1
2427
}
2528

26-
var count = read(att)
27-
print("Page-fault count: %lld", count)
29+
var page_fault_count = read(page).scaled
30+
print("Page-fault count: %lld", page_fault_count)
31+
var branch_count = read(branch).scaled
32+
print("Branch-miss count: %lld", branch_count)
2833

29-
detach(att)
30-
print("Page-fault perf_event demo detached")
34+
detach(page)
35+
detach(branch)
36+
print("perf_event demo detached")
3137
detach(prog)
3238
return 0
3339
}

src/ir_generator.ml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -877,7 +877,7 @@ let rec lower_expression ctx (expr : Ast.expr) =
877877
emit_variable_decl_val ctx ptr_val ptr_val.val_type (Some ptr_expr) expr.expr_pos;
878878

879879
(* result = *ptr *)
880-
let load_expr = make_ir_expr (IRValue ptr_val) element_type expr.expr_pos in
880+
let load_expr = make_ir_expr (IRUnOp (IRDeref, ptr_val)) element_type expr.expr_pos in
881881
emit_variable_decl_val ctx result_val element_type (Some load_expr) expr.expr_pos);
882882

883883
result_val)
@@ -3578,4 +3578,4 @@ let generate_ir ?(use_type_annotations=false) ast symbol_table source_name =
35783578
with
35793579
| exn ->
35803580
Printf.eprintf "IR generation failed: %s\n" (Printexc.to_string exn);
3581-
raise exn
3581+
raise exn

src/parser.mly

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,6 @@
145145
%type <Ast.catch_pattern> catch_pattern
146146
%type <Ast.expr> expression
147147
%type <Ast.expr> primary_expression
148-
%type <Ast.expr> function_call
149148
%type <Ast.expr> array_access
150149
%type <Ast.expr> struct_literal
151150
%type <Ast.expr> match_expression
@@ -462,7 +461,6 @@ defer_statement:
462461
/* Expressions - Conservative approach with precedence declarations */
463462
expression:
464463
| primary_expression { $1 }
465-
| function_call { $1 }
466464
| array_access { $1 }
467465
| struct_literal { $1 }
468466
| match_expression { $1 }
@@ -492,16 +490,10 @@ primary_expression:
492490
| LPAREN expression RPAREN { $2 }
493491
| primary_expression DOT field_name { make_expr (FieldAccess ($1, $3)) (make_pos ()) }
494492
| primary_expression ARROW field_name { make_expr (ArrowAccess ($1, $3)) (make_pos ()) }
495-
| NEW bpf_type LPAREN RPAREN { make_expr (New $2) (make_pos ()) }
496-
| NEW bpf_type LPAREN expression RPAREN { make_expr (NewWithFlag ($2, $4)) (make_pos ()) }
497-
498-
function_call:
499-
| IDENTIFIER LPAREN argument_list RPAREN
500-
{ make_expr (Call (make_expr (Identifier $1) (make_pos ()), $3)) (make_pos ()) }
501493
| primary_expression LPAREN argument_list RPAREN
502494
{ make_expr (Call ($1, $3)) (make_pos ()) }
503-
504-
495+
| NEW bpf_type LPAREN RPAREN { make_expr (New $2) (make_pos ()) }
496+
| NEW bpf_type LPAREN expression RPAREN { make_expr (NewWithFlag ($2, $4)) (make_pos ()) }
505497

506498
array_access:
507499
| expression LBRACKET expression RBRACKET { make_expr (ArrayAccess ($1, $3)) (make_pos ()) }
@@ -739,4 +731,4 @@ field_name:
739731
| IDENTIFIER { $1 }
740732
| TYPE { "type" }
741733

742-
%%
734+
%%

0 commit comments

Comments
 (0)