Trampolines by Bike · Pull Request #1777 · clasp-developers/clasp

Bike · 2026-05-27T14:05:48Z

Adds a facility for giving bytecodes native "trampolines" so that they are visible by name in external backtraces (e.g. perf, gdb). Supersedes #1765. drmeister has done the work here, just filing a PR so I can review it.

Arena-based trampolines (hand-coded x86_64), sampling profiler, flame graph generation, trampoline-aware backtraces, command-line extensions, and snapshot save/load support. Excludes bytecode interpreter changes (computed gotos, VMDynRecord dynamic binding stack) which remain on the interpreter-work branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Also demangle C++ function names

- New trampoline_aarch64.h paralleling trampoline_x86_64.h with hand-coded bytecode (36B) and GF (32B) trampolines using LDR from literal pool. Shared CIE and per-kind FDEs with full DWARF CFI for unwinding. Same instructions for Linux arm64 and Apple Silicon; macOS W^X support still needs MAP_JIT in ExecutableArena. - Wire aarch64 templates into trampolineWork.cc via #elif __aarch64__. - Demangle C++ symbols in sampling_profiler symbolicate_one via abi::__cxa_demangle for readable flame graph frames. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

At profile-start, snapshot /proc/self/maps (Linux) or dyld image list (macOS) into a sorted executable-range cache. The SIGPROF handler binary-searches each saved_rip during the frame-pointer walk: if the address isn't in any executable mapping, the chain is broken (frame compiled without -fno-omit-frame-pointer) and the walk stops cleanly instead of following garbage pointers. New JIT/arena pages registered dynamically via sampling_profiler_add_executable_range(), called from ExecutableArena::allocate() so trampolines mmap'd during profiling are immediately recognized. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wraps a body form with sampling profiler start/stop and writes a flame graph SVG on completion. Profiler is stopped and reset via unwind-protect on any exit path. (ext:with-flame-profile ("/tmp/my.svg" :rate 197 :title "test") (expensive-work)) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The .dif file stored hash used to have to match the calculated hash of the clasp_gc.sif file. That's too restrictive

Bike

mostly seems ok. some of the comments are false or misleading. I didn't read over the sampling profiler or flame graphs in great detail, but they're optional, so I'm not too concerned about those.

of course, tests are failing, so that's no good

Bike · 2026-05-27T14:28:23Z

+ * That profiler measures user-annotated regions; this one periodically
+ * snapshots whatever code is running.
+ *
+ * See Phase 4 / Phase 5 for post-mortem symbolication and flame-graph


What phases are this referring to? (Probably phases of a Claude plan, but that's not apparent from the code, so.)

Phase 4 and Phase 5 are mentioned here:
https://github.com/clasp-developers/clasp/blob/trampolines/src/core/sampling_profiler.cc#L557
They are symbolication and collapsed-stacks aggregation.

Bike · 2026-05-27T14:29:32Z

+// Populate the calling thread's stack bounds for later frame-walking.
+// Must be called from a non-signal context. sampling_profiler_start
+// calls this automatically for the calling thread; other threads that
+// should be fully profiled need to call ext:profile-register-thread


I don't think they do, since mpPackage.h does for all threads?

I'm not sure what you are asking.

I added sampling_profiler_register_current_thread so the sampling_profiler has easy access to the thread stacks that we want to walk. It was easier than figuring how to feed it our existing stack bounds. It's 40 bytes for each lisp thread. I added calls for the main thread and all threads launched in mpPackage.cc

I mean, the comment says that threads need to call ext:profile-register-thread. But they actually don't, since you set it up already in all thread launches in mpPackage.cc and for the main thread.

Bike · 2026-05-27T14:53:59Z

+// name participates in the same "wrapper:name" -> unique-name substitution
+// (the suffix "_end" survives unchanged), so each trampoline gets its own
+// matching end marker symbol.
+__attribute__((used, noinline)) void WRAPPER_END_MARKER() asm("wrapper:name_end");


since we're hardcoding the instructions, none of this should be necessary, and i don't see it being used

I removed it.

Bike · 2026-05-27T14:54:46Z


-return_type bytecode_call(uint64_t pc, void *closure, uint64_t nargs, void **args);
+// Indirect through a global function pointer rather than calling bytecode_call
+// directly. Each compiled trampoline's call topology is now identical


"now identical" - claude seems prone to putting progress reports in comments, but they'd be more appropriate in commits

The file is removed.

Bike · 2026-05-28T20:52:16Z

+ *     the preceding CIE (= cie_size + 4).
+ *   - FDE's PC range = code_size.
+ * Because every slot has the same layout, the bytes are identical across
+ * slots and a single memcpy is all that's required.


phrased misleadingly - each copy needs to be patched with some addresses, so in the end they will have different bytes. And while there's only one explicit memcpy call, the template is copied into a std::vector first, which probably does the like of memcpy.

I updated the comment. The trampolines use 8-byte absolute address to bytecode_vm so they don't need to be individually fixed up.

Bike · 2026-05-28T21:25:22Z

+  bool install_template(const uint8_t* tramp_bytes, size_t tramp_size,
+                        const uint8_t* cie_bytes,   size_t cie_len,
+                        const uint8_t* fde_bytes,   size_t fde_len) {
+    std::lock_guard<std::mutex> g(_init_lock);


Checking if _initialized is true first (presumably with acquire) should be faster than grabbing a lock every time we compile anything.

I think this is only called once at startup. Not every time we compile anything.

Bike · 2026-05-28T21:32:37Z

+
+(defun join (list sep)
+  (with-output-to-string (out)
+    (loop for cell on list


(loop for (item . rest) on list do (write-string item out) when rest do (write-char sep out))

Changed - thanks.

Bike · 2026-05-28T21:36:05Z

+}
+
+// Read the interrupted instruction pointer out of the context structure.
+// x86_64 only for now — portable to arm64 when we need it.


so... not portable?

I haven't tested it on arm64 yet.

Bike · 2026-05-28T21:36:50Z

+
+thread_local ThreadStackBounds t_stack_bounds{0, 0, false};
+
+static void populate_stack_bounds_for_this_thread() {


I already do almost exactly this for scanning in the mmtk branch. so I guess that'll have to get merged eventually

Rerun analyze once we move back to main

It's a wrapper that I use in the main thread and child threads created in mpPackage.cc

that used src/core/trampoline/trampoline.cc

Christian Schafmeisterr and others added 11 commits May 26, 2026 17:48

Register main thread for profiling

28c4215

Also demangle C++ function names

Export with-frame-profile from ext

6de51ba

Add fsync

3396a30

Merge branch 'main' into trampolines

39ef401

Add ${PID} to CLASP_FLAME_PROFILE=path=something${PID}.svg

2a2999c

Remove the strict test checking hashes

308f64b

The .dif file stored hash used to have to match the calculated hash of the clasp_gc.sif file. That's too restrictive

Merge branch 'main' into trampolines

1ba1df2

Bike commented May 28, 2026

View reviewed changes

Christian Schafmeisterr and others added 9 commits June 1, 2026 08:56

flame graph now uses ${HOME}

7e2bf43

Picked up some changes to clasp_gc.sif - I'm not sure where

84a927d

Rerun analyze once we move back to main

Added amber-x86.def

2a66fac

Silence warnings that happen on __aarch64__

d95d632

Added description of Phase 4 and 5

f8ad298

Use sampling_profiler_register_current_thread

6ec8048

It's a wrapper that I use in the main thread and child threads created in mpPackage.cc

Use Alex' code

354e362

Provide a more detailed comment

607ded8

Remove dead code

b6ddf96

that used src/core/trampoline/trampoline.cc


		thread_local ThreadStackBounds t_stack_bounds{0, 0, false};

		static void populate_stack_bounds_for_this_thread() {

Conversation

Bike commented May 27, 2026

Uh oh!

Bike left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bike left a comment •

edited

Loading