status	active
audience	contributors
last-verified	2026-04-09

Viper VM — Architecture & Implementation Guide

Comprehensive guide to the Viper Virtual Machine (VM), which executes Viper IL programs. This document covers the VM's design philosophy, architecture, execution model, and source code organization.

Overview
Architecture & Design Philosophy
Key Components
Execution Model
Dispatch Strategies
Memory Model
Exception & Trap Handling
Runtime Integration
Debug & Tracing
Source Code Guide
Performance Features
Best Practices
Further Reading

Appendices

Performance Tuning
Concurrency Model
Runtime ABI Reference

Overview

What is the Viper VM?

The Viper VM is the primary execution engine for programs written in Viper's Intermediate Language ( IL). It serves as the primary execution engine for the Viper toolchain, providing:

Deterministic execution of IL programs
Debugging and tracing capabilities
Exception handling with structured error recovery
Runtime function calls via the RuntimeBridge
Multiple dispatch strategies optimized for different use cases

Key Characteristics

Feature	Description
Architecture	Stack-based interpreter with SSA register file
Dispatch	Pluggable (function table, switch, computed goto)
Memory	Frame-local operand stack with explicit allocation via `alloca` (64KB default)
Error Handling	Structured exception handling with trap metadata
Debugging	Built-in breakpoints, stepping, and tracing
Performance	Tail-call optimization, opcode counting, inline caching

Architecture & Design Philosophy

Core Principles

The VM design prioritizes several key principles:

Modularity: Pluggable dispatch strategies allow optimization without changing the core interpreter
Inspectability: Comprehensive tracing and debugging support for all execution paths
Correctness: Deterministic execution with explicit error handling
Performance: Multiple optimization layers (TCO, inline caching, threaded dispatch)
Simplicity: Clean separation between interpretation, runtime, and tooling

High-Level Architecture

┌─────────────────────────────────────────────────────────┐
│                      VM (Interpreter)                    │
├─────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
│  │   Dispatch   │  │   Opcode     │  │    Debug     │ │
│  │   Strategy   │──│   Handlers   │──│   Control    │ │
│  └──────────────┘  └──────────────┘  └──────────────┘ │
│                           │                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
│  │    Frame     │  │    Trap      │  │    Trace     │ │
│  │   Manager    │  │   Handler    │  │     Sink     │ │
│  └──────────────┘  └──────────────┘  └──────────────┘ │
└─────────────────────────────────────────────────────────┘
         │                    │                    │
         ▼                    ▼                    ▼
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  IL Module   │    │   Runtime    │    │   External   │
│  (readonly)  │    │    Bridge    │    │   Callbacks  │
└──────────────┘    └──────────────┘    └──────────────┘

Component Relationships

VM owns: Dispatch driver, trace sink, debug controller, function/string maps
VM borrows: IL Module (must outlive VM), optional debug script
Frame owns: Register file, operand stack, exception handlers
RuntimeBridge provides: C runtime function invocation, trap reporting

Key Components

1. VM Class (`src/vm/VM.hpp`)

The main interpreter class that orchestrates execution:

class VM {
    // Module and configuration
    const il::core::Module& mod;                    // IL module (non-owning)
    std::shared_ptr<ProgramState> programState_;    // Shared globals/context
    TraceSink tracer;                               // Trace output
    DebugCtrl debug;                                // Breakpoint controller
    std::unique_ptr<DispatchDriver> dispatchDriver; // Pluggable dispatch
    DispatchKind dispatchKind;                      // Active strategy type

    // Execution state
    uint64_t instrCount;                // Executed instructions
    uint64_t maxSteps;                  // Step limit (0 = unlimited)
    std::size_t stackBytes_;            // Per-frame stack size (default 64KB)
    std::vector<ExecState*> execStack;  // Active execution stack for unwinding

    // Caching and lookup (string_view keys for zero-copy)
    FnMap fnMap;                                    // Function lookup table
    StrMap inlineLiteralCache;                      // String literal handles (RAII)
    std::unordered_map<const BasicBlock*, const Function*> blockToFunction;
    std::unordered_map<const Function*, size_t> regCountCache_;

    // Buffer pools for allocation reuse
    std::vector<std::vector<uint8_t>> stackBufferPool_;
    std::vector<std::vector<Slot>> regFilePool_;

    // Exception handling
    TrapContext currentContext;         // Active instruction context
    TrapState lastTrap;                 // Last trap for diagnostics
    TrapToken trapToken;                // Error payload for trap.err

    // Polling and profiling
    uint32_t pollEveryN_;               // Host callback frequency
    std::function<bool(VM&)> pollCallback_;
#if VIPER_VM_OPCOUNTS
    std::array<uint64_t, kNumOpcodes> opCounts_;  // Per-opcode counters
#endif

    // Per-VM extern registry (optional)
    ExternRegistry* externRegistry_;    // Custom function resolution
};

Key responsibilities:

Module initialization and function lookup
Dispatch strategy selection and lifecycle
String literal caching and lifetime management
Buffer pooling for recursive call efficiency
Trap context tracking and formatting
Debug breakpoint coordination
Host polling for embedded applications

2. Frame (`src/vm/VM.hpp`)

Represents a single function activation record:

struct Frame {
    // Nested types for exception handling
    struct HandlerRecord {
        const BasicBlock* handler;  // Handler block
        size_t ipSnapshot;          // IP to restore
    };
    struct ResumeState {
        const BasicBlock* block;    // Faulting block
        size_t faultIp, nextIp;     // Instruction pointers
        bool valid;
    };

    const Function* func;                          // Active function (non-owning)
    std::vector<Slot> regs;                        // SSA register file
    static constexpr size_t kDefaultStackSize = 65536;  // 64KB
    std::vector<uint8_t> stack;                    // Operand stack (alloca)
    size_t sp = 0;                                 // Stack pointer in bytes
    std::vector<std::optional<Slot>> params;       // Pending block parameters
    std::vector<HandlerRecord> ehStack;            // Exception handlers
    VmError activeError{};                         // Current error payload
    ResumeState resumeState{};                     // Resumption metadata
};

Key responsibilities:

SSA value storage in register file
Stack allocation via alloca instruction (bump sp within stack)
Block parameter passing
Exception handler stack management
Resume state for error recovery

3. Slot (`src/vm/VM.hpp`)

Tagged union for runtime values:

union Slot {
    int64_t i64;      // Integer value
    double f64;       // Floating-point value
    void* ptr;        // Generic pointer
    rt_string str;    // Runtime string handle
};

All IL values are represented as Slot during execution. Type safety is enforced by the IL verifier and opcode handlers.

4. Dispatch Drivers

Pluggable strategies for instruction fetch-decode-execute:

FnTableDispatchDriver: Uses function pointer table lookup
SwitchDispatchDriver: Uses switch statement (inline handlers)
ThreadedDispatchDriver: Uses computed goto (GCC/Clang only)

Selected at VM construction via DispatchKind enum.

5. Opcode Handlers (`src/vm/OpHandlers*.hpp`)

Category-organized functions that implement IL instructions:

Control (OpHandlers_Control.hpp): br, cbr, call, ret, switch
Integer (OpHandlers_Int.hpp): add, mul, icmp_*, scmp_*
Float (OpHandlers_Float.hpp): fadd, fmul, fcmp_*
Memory (OpHandlers_Memory.hpp): alloca, load, store, gep

Each handler has signature:

ExecResult handler(VM& vm, Frame& fr, const Instr& in,
                  const BlockMap& blocks,
                  const BasicBlock*& bb, size_t& ip);

Execution Model

Execution Flow

1. VM::run()
   ├─ Lookup "main" function
   └─ Call execFunction()
      ├─ setupFrame() → Initialize registers, stack, block map
      └─ runFunctionLoop()
         └─ dispatchDriver->run()
            └─ Loop:
               ├─ selectInstruction() → Fetch next instruction
               ├─ executeOpcode() → Dispatch to handler
               ├─ handleDebugBreak() → Check breakpoints
               └─ finalizeDispatch() → Update IP, check for return

Instruction Execution Cycle

For each instruction:

Select: selectInstruction() identifies the next instruction
Trace: traceInstruction() emits trace output if enabled
Execute: Handler updates frame state and returns ExecResult
Finalize: finalizeDispatch() processes jumps/returns

Control Flow

Basic blocks:

Execution starts at the entry block
Terminators (ret, br, cbr, switch) transfer control
Block parameters are transferred before entering a new block

Function calls:

call opcode pushes a new frame onto the execution stack
Arguments are evaluated and passed as block parameters
Return value is propagated back via Slot

Tail calls:

Detected via call.tail attribute
Reuses current frame instead of allocating new one
Eliminates stack growth for recursive functions

Dispatch Strategies

1. Function Table Dispatch (Default)

Uses a compile-time generated array of function pointers:

// Generated in HandlerTable.hpp
static const OpcodeHandlerTable& getOpcodeHandlers() {
    static OpcodeHandlerTable table = {
        &handleAdd,    // Opcode::Add
        &handleSub,    // Opcode::Sub
        // ... one entry per opcode
    };
    return table;
}

Pros: Simple, portable, easy to debug Cons: Indirect call overhead per instruction

2. Switch Dispatch

Expands all handlers inline within a giant switch statement:

while (true) {
    Opcode op = fetchOpcode(state);
    switch (op) {
        case Opcode::Add: inline_handle_Add(state); break;
        case Opcode::Sub: inline_handle_Sub(state); break;
        // ... case per opcode
    }
}

Pros: Better branch prediction, potential for inlining Cons: Large code size, longer compile time

3. Threaded Dispatch (GCC/Clang)

Uses computed goto with label addresses:

static void* kOpLabels[] = { &&LBL_Add, &&LBL_Sub, /* ... */ };

#define DISPATCH_TO(opcode) goto *kOpLabels[opcode]

for (;;) {
    DISPATCH_TO(fetchOpcode(state));

    LBL_Add: inline_handle_Add(state); DISPATCH_TO(fetchNext());
    LBL_Sub: inline_handle_Sub(state); DISPATCH_TO(fetchNext());
    // ... label per opcode
}

Pros: Fastest dispatch, direct jump to handlers Cons: Compiler-specific, large code size

Selecting a Dispatch Strategy

The dispatch strategy is selected at VM construction via environment variable:

# Use function table dispatch (portable, moderate performance)
VIPER_DISPATCH=table ./viper -run program.il

# Use switch statement dispatch (good cache locality)
VIPER_DISPATCH=switch ./viper -run program.il

# Use threaded dispatch (fastest, requires GCC/Clang)
VIPER_DISPATCH=threaded ./viper -run program.il

Default: Threaded if supported (VIPER_THREADING_SUPPORTED=1), otherwise Switch.

Shared Dispatch Loop

All strategies share a common dispatch loop (runSharedDispatchLoop) that handles:

State reset per iteration (beginDispatch)
Instruction selection (selectInstruction)
Debug hooks (VIPER_VM_DISPATCH_BEFORE/AFTER)
Trap handling for threaded dispatch
Finalization and exit conditions (finalizeDispatch)

The strategy only implements executeInstruction() to map opcodes to handlers.

Dispatch Loop Performance Optimizations

The shared dispatch loop includes several optimizations:

Cached strategy properties: requiresTrapCatch() and handlesFinalizationInternally() are cached once at loop entry to avoid virtual call overhead per instruction.
Branch hints: [[likely]] and [[unlikely]] attributes guide code layout for hot paths.
Zero-cost hooks: VIPER_VM_DISPATCH_BEFORE and VIPER_VM_DISPATCH_AFTER macros compile to nothing when disabled. When opcode counting is enabled (VIPER_VM_OPCOUNTS=1), the counter increment is gated by a runtime flag (config.enableOpcodeCounts).
Efficient polling: VIPER_VM_DISPATCH_AFTER only increments the poll counter when polling is active (interruptEveryN > 0), avoiding wasted cycles in the common case.

Instrumentation Hooks

The VM provides compile-time configurable hooks for profiling and embedding:

// In VMConfig.hpp - define before including VM headers
#define VIPER_VM_DISPATCH_BEFORE(ST, OPCODE) \
    do { myProfiler.onInstruction(ST, OPCODE); } while(0)

#define VIPER_VM_DISPATCH_AFTER(ST, OPCODE) \
    do { myProfiler.afterInstruction(ST, OPCODE); } while(0)

Predefined behavior:

VIPER_VM_DISPATCH_BEFORE: Increments per-opcode counters when VIPER_VM_OPCOUNTS=1
VIPER_VM_DISPATCH_AFTER: Calls poll callback every N instructions if configured

Per-Opcode Counters

Enable compile-time opcode counting:

#define VIPER_VM_OPCOUNTS 1  // Default: enabled

Access counters at runtime:

vm.resetOpcodeCounts();
vm.run();
auto counts = vm.getOpcodeCounts();  // Returns array<uint64_t, kNumOpcodes>
for (auto [opcode, count] : vm.getNonZeroOpcodeCounts()) {
    std::cout << opcodeMnemonic(opcode) << ": " << count << "\n";
}

Disable via environment: VIPER_ENABLE_OPCOUNTS=0

Benchmark Harness

The viper bench command provides a built-in benchmark harness for comparing dispatch strategies:

# Run all three strategies with 3 iterations each
viper bench program.il

# Run a specific strategy with 5 iterations
viper bench program.il -n 5 --table

# Run multiple files with JSON output
viper bench file1.il file2.il --json

# Limit execution with max-steps
viper bench program.il --max-steps 1000000

Output format (text):

BENCH <file> <strategy> instr=<N> time_ms=<T> insns_per_sec=<R>

Output format (JSON):

[
  {
    "file": "program.il",
    "strategy": "table",
    "success": true,
    "instructions": 7000004,
    "time_ms": 3618.33,
    "insns_per_sec": 1934596,
    "return_value": 0
  }
]

Strategy selection flags:

--table: Run only FnTable dispatch
--switch: Run only Switch dispatch
--threaded: Run only Threaded dispatch
(default): Run all three strategies

Example benchmark IL programs are available in examples/il/benchmarks/:

arith_stress.il: Heavy arithmetic workload
branch_stress.il: Branch-heavy control flow
call_stress.il: Function call overhead testing
mixed_stress.il: Combined workload
string_stress.il: String operations

Memory Model

Register File

Each frame has an SSA register file sized to the function's register count:

frame.regs.resize(func->registerCount);

Registers are indexed by SSA value ID. Each register is written once and read many times (SSA property).

Operand Stack

Each frame has an operand stack for alloca allocations. The default capacity is 64KB (Frame::kDefaultStackSize):

std::vector<uint8_t> stack; // capacity ~= 64KB by default
size_t sp = 0;  // Stack pointer in bytes

Usage:

alloca N allocates N bytes on the stack
Returns a ptr pointing into stack at offset sp
Stack grows upward (sp increases)
No explicit deallocation (frame-scoped)

Limits:

Default 64KB size per frame (Frame::kDefaultStackSize)
Overflow causes trap
Suitable for temporaries, strings, and moderate-sized arrays (e.g., 80×25 screen buffers)

String Handles

Strings are managed by the runtime as opaque handles (rt_string):

Global strings: Cached in strMap, lifetime = VM lifetime
Inline literals: Cached in inlineLiteralCache, supports embedded NULs
Runtime strings: Created by runtime functions, reference-counted

The VM releases all cached handles in its destructor.

Exception & Trap Handling

Trap Types

Defined in Trap.hpp:

enum class TrapKind {
    DivideByZero,     // Integer division by zero
    Overflow,         // Arithmetic overflow
    InvalidCast,      // Type conversion failure
    DomainError,      // Semantic violation
    Bounds,           // Array bounds check
    FileNotFound,     // File I/O error
    EOF,              // End of file
    IOError,          // Generic I/O failure
    InvalidOperation, // Invalid state transition
    RuntimeError      // Catch-all
};

Exception Handler Stack

Each frame maintains an exception handler stack (Frame::ehStack) using the HandlerRecord type defined in Frame (see Key Components).

IL instructions:

eh.push label handler — Push handler onto stack
eh.pop — Pop handler from stack
eh.entry — Mark entry point of handler block

Trap Dispatch

When a trap occurs:

Capture context: Function, block, instruction, source location
Search for handler: Walk ehStack for active handler
Dispatch or unwind:
- Handler found: Jump to handler block, set activeError
- No handler: Throw TrapDispatchSignal to unwind stack
Resume: Handler uses resume.same, resume.next, or resume.label

Structured Error Payload

struct VmError {
    TrapKind kind;     // Error classification
    int32_t code;      // Secondary code
    uint64_t ip;       // Instruction pointer
    int32_t line;      // Source line (-1 if unknown)
};

Accessible via:

trap.kind — Read current trap kind
err.get_kind %e — Extract kind from error value
err.get_code %e — Extract code from error value

Runtime Integration

RuntimeBridge (`src/vm/RuntimeBridge.hpp`)

Adapter between VM and C runtime library:

class RuntimeBridge {
    static Slot call(RuntimeCallContext& ctx,
                    const std::string& name,
                    const std::vector<Slot>& args,
                    ...);

    static void trap(TrapKind kind, const std::string& msg, ...);
    static const RuntimeCallContext* activeContext();
};

Call flow:

IL call @Viper.Terminal.PrintI64(args) instruction (or legacy @rt_* alias)
Handler evaluates arguments into bytecode/VM slots
Bytecode caches known runtime descriptors in the native-function table and calls the resolved-descriptor RuntimeBridge entry point when possible
C function is invoked with marshalled arguments
Return value is marshalled back to Slot

Note: The runtime supports both canonical @Viper.* names and legacy @rt_* aliases when built with -DVIPER_RUNTIME_NS_DUAL=ON.

Runtime Call Context

Tracks active runtime call for trap diagnostics:

struct RuntimeCallContext {
    SourceLoc loc;                      // Call site location
    std::string function;               // Calling IL function
    std::string block;                  // Calling block
    const RuntimeDescriptor* descriptor; // Runtime function
    Slot* argBegin;                     // Argument array
    size_t argCount;                    // Argument count
};

Populated before each runtime call, cleared after.

External Function Registry

Custom functions can be registered:

struct ExternDesc {
    std::string name;
    void* ptr;
    // ... signature metadata
};

RuntimeBridge::registerExtern(desc);

Enables embedding applications to extend the runtime.

Debug & Tracing

Trace Sink

Configurable output for instruction tracing:

struct TraceConfig {
    bool enabled;           // Enable tracing
    bool ilTrace;           // Trace IL instructions
    bool boolTrace;         // Trace boolean values
    bool srcTrace;          // Trace source locations
};

Output format:

[func:block:ip] opcode operands → result

Debug Controller

Manages breakpoints and stepping:

class DebugCtrl {
    // Breakpoints
    void addBreakLabel(std::string label);
    void addBreakSrcLine(std::string file, int line);
    void clearBreaks();

    // Stepping
    void requestStep(uint64_t count);
    bool shouldBreak(/* context */);
};

Breakpoint types:

Block label breakpoints
Source line breakpoints
Step count breakpoints

Debug Scripting

Optional command script for automated debugging:

class DebugScript {
    virtual Action onBreakpoint(VM& vm, Frame& fr) = 0;
};

Allows programmatic control of execution (continue, step, inspect, etc.).

Memory Watches

Monitor memory access for debugging:

debug.addMemWatch(addr, size, "tag");
auto hits = debug.drainMemWatchEvents();

Tracks reads/writes to specific memory ranges.

Source Code Guide

Directory Structure

src/vm/
├── VM.hpp/cpp                  # Main VM class and core interpreter logic
├── VMContext.hpp/cpp           # Execution context helpers
├── VMConfig.hpp                # Build configuration
├── VMConstants.hpp             # VM constants
├── VMInit.cpp                  # VM initialization
├── FunctionExecCache.cpp       # Pre-resolved operand cache per (function, block)
├── Runner.cpp                  # Public API facade
│
├── OpHandlers.hpp/cpp          # Handler aggregation and table generation
├── OpHandlerUtils.hpp/cpp      # Handler utility functions
├── OpHandlerAccess.hpp         # Handler access utilities
├── OpcodeHandlerHelpers.hpp    # Common handler helper functions
├── OpHandlers_Control.hpp      # Control flow handlers
├── OpHandlers_Int.hpp          # Integer arithmetic handlers
├── OpHandlers_Float.hpp        # Float arithmetic handlers
├── OpHandlers_Memory.hpp       # Memory operation handlers
├── IntOpSupport.hpp            # Integer operation support
│
├── DispatchStrategy.hpp/cpp    # Pluggable dispatch strategies
├── DispatchMacros.hpp          # Dispatch loop macros and hooks
│
├── ops/
│   ├── Op_CallRet.cpp          # Call/return implementation
│   ├── Op_BranchSwitch.cpp     # Branch/switch implementation
│   ├── Op_TrapEh.cpp           # Trap/exception handling
│   ├── common/Branching.*      # Branch target resolution helpers
│   ├── schema/ops.yaml         # Opcode schema definitions
│   └── generated/              # Generated dispatch tables and handlers
│       ├── HandlerTable.hpp    # Static handler function table
│       ├── InlineHandlers*.inc # Inline handler implementations
│       ├── SwitchDispatch*.inc # Switch dispatch implementations
│       └── Threaded*.inc       # Threaded dispatch labels/cases
│
├── RuntimeBridge.hpp/cpp       # Runtime integration
├── Marshal.hpp/cpp             # Value marshalling
│
├── Trap.hpp/cpp                # Trap definitions and formatting
├── TrapInvariants.hpp          # Trap assertion helpers
├── DiagFormat.hpp/cpp          # Diagnostic message formatting
├── err_bridge.hpp/cpp          # Error bridge helpers
│
├── control_flow.hpp/cpp        # Control flow utilities
├── tco.hpp/cpp                 # Tail-call optimization
├── ViperStringHandle.hpp       # RAII string handle wrapper
│
├── int_ops_arith.cpp           # Integer arithmetic implementations
├── int_ops_cmp.cpp             # Integer comparison implementations
├── int_ops_convert.cpp         # Integer conversion implementations
├── fp_ops.cpp                  # Floating-point implementations
├── mem_ops.cpp                 # Memory operation implementations
│
├── ThreadsRuntime.cpp          # Viper.Threads runtime support
│
└── debug/                      # Debug and tracing subsystem
    └── *.cpp                   # Debug controller, trace, scripting

Key Files by Functionality

Core Interpreter:

VM.hpp, VM.cpp — Main interpreter class
VMContext.hpp — Shared execution helpers
Runner.cpp — Public API facade

Dispatch:

VM.cpp — Dispatch driver implementations
ops/generated/ — Generated dispatch tables

Opcode Handlers:

OpHandlers*.hpp — Handler declarations by category
ops/Op_*.cpp — Complex handler implementations
int_ops_*.cpp, fp_ops.cpp, mem_ops.cpp — Arithmetic implementations

Exception Handling:

Trap.hpp, Trap.cpp — Trap types and formatting
err_bridge.hpp — Error bridge integration
ops/Op_TrapEh.cpp — Exception handler opcodes

Runtime Integration:

RuntimeBridge.hpp, RuntimeBridge.cpp — C runtime adapter
Marshal.hpp, Marshal.cpp — Value marshalling

Debugging:

debug/Debug.cpp — Breakpoint management
debug/DebugScript.cpp — Debug scripting support
debug/Trace.cpp — Trace output formatting
debug/VM_DebugUtils.cpp — Debug utility helpers
debug/VMDebug.cpp — Debug integration

Performance Features

Tail-Call Optimization

Enabled by default (VIPER_VM_TAILCALL):

// Detect tail call
if (instr.isTailCall()) {
    // Reuse current frame
    return executeTailCall(fr, callee, args);
}

Eliminates stack growth for recursive functions.

Opcode Counting

Compile-time flag (VIPER_VM_OPCOUNTS):

#if VIPER_VM_OPCOUNTS
std::array<uint64_t, kNumOpcodes> opCounts_;
#endif

Tracks execution count per opcode for profiling.

API:

const auto& counts = vm.opcodeCounts();
auto top = vm.topOpcodes(10);  // Top 10 opcodes
vm.resetOpcodeCounts();

Execution Context Optimization

The VM execution context has been optimized to minimize overhead on the hot path:

Trusted bytecode dispatch: Source execution uses BytecodeCompiler::compileChecked() and then enables trusted dispatch in the bytecode VM. Trusted dispatch skips per-instruction PC and operand-stack validation in the interpreter loop while keeping checked compilation, verifier diagnostics, runtime traps, and branch-target checks available for debug/unchecked embedding paths.

ExecState-based dispatch: The dispatch macros (VIPER_VM_DISPATCH_BEFORE, VIPER_VM_DISPATCH_AFTER) use ExecState directly instead of VMContext, avoiding an extra indirection per instruction:

// Hot path uses ExecState directly
VIPER_VM_DISPATCH_BEFORE(state, opcode);  // state is ExecState&

// ExecState.config includes all per-instruction configuration
struct PollConfig {
    uint32_t interruptEveryN;
    std::function<bool(VM&)> pollCallback;
    bool enableOpcodeCounts;  // Direct access for opcode counting
};

VMContext for external APIs: The VMContext wrapper is still used for external APIs (stepOnce, fetchOpcode, handleTrapDispatch) to provide a stable interface, but it's not required on the per-instruction hot path.

Execution Stack Pre-allocation

The execution stack (execStack) tracks active ExecState pointers for trap unwinding and debugging:

// Pre-allocated to kExecStackInitialCapacity (64) in VM constructor
std::vector<ExecState*> execStack;

// Unified RAII guard for stack management
struct ExecStackGuard {
    VM& vm;
    ExecState* state;
    ExecStackGuard(VM& vmRef, ExecState& stRef) noexcept;
    ~ExecStackGuard() noexcept;
};

Optimizations:

Pre-allocated capacity eliminates heap allocation for typical call depths
Unified ExecStackGuard in VM.hpp removes code duplication
noexcept specifiers enable compiler optimizations

Inline String Literal Cache

Caches runtime handles for string literals:

std::unordered_map<std::string_view, ViperStringHandle, ...> inlineLiteralCache;

Optimizations:

Pre-populated during VM construction by scanning all ConstStr operands in the module
Fast path uses find() for pre-populated strings (common case)
Fallback try_emplace only for edge cases (dynamically generated strings)
Eliminates repeated allocation and map insertion for frequently used literals

Switch Cache

Memoizes switch dispatch data:

struct SwitchCache {
    std::unordered_map<int32_t, const BasicBlock*> caseMap;
    const BasicBlock* defaultTarget;
};

Amortizes switch table construction across iterations.

Frame Buffer Pooling

The VM maintains pools for frequently allocated frame resources:

Stack Buffer Pool:

Reuses operand stack buffers across function calls
Pre-sized to Frame::kDefaultStackSize (64KB)
Eliminates allocation overhead for recursive functions

Register File Pool:

Reuses SSA register file vectors
Sized by clear() and resize() rather than reallocation
Reduces heap churn during deep call stacks

Benefit: Recursive functions like factorial(n) allocate only once per unique call depth, then reuse pooled buffers for subsequent calls. This significantly reduces GC pressure and improves cache locality.

Host Polling

Configurable interrupt callback:

vm.setPollConfig(everyN, [](VM& vm) {
    // Host logic (UI events, etc.)
    return true;  // Continue execution
});

Allows embedding applications to maintain responsiveness.

Best Practices

For VM Developers

Opcode Handlers: Keep handlers simple and delegate to helper functions
Error Handling: Use RuntimeBridge::trap() for runtime errors
Caching: Consider caching hot lookups (strings, functions, blocks)
Debugging: Add trace points for complex operations
Testing: Write unit tests for each handler

For IL Generators

SSA Form: Ensure proper SSA (single assignment per register)
Terminators: Every block must end with a terminator
Type Safety: Match operand types to instruction signatures
Exception Handlers: Properly nest eh.push/eh.pop pairs
Stack Usage: Keep alloca sizes within frame limits (~64KB by default)

For Embedders

Configuration: Choose appropriate dispatch strategy
Polling: Set reasonable interrupt frequency
Externs: Register custom functions before execution
Tracing: Enable tracing for debugging, disable for production
Error Handling: Catch and handle TrapDispatchSignal if needed

Appendix: Performance Tuning

This section summarizes runtime tuning knobs and benchmarking for the VM.

Dispatch Modes

Env VIPER_DISPATCH:
- table: function-table dispatch via executeOpcode
- switch: inline switch dispatch with generated handlers
- threaded: computed goto (if built with VIPER_VM_THREADED)
Env VIPER_ENABLE_OPCOUNTS (default on): enable per-opcode execution counters. You can query counts via Runner::opcodeCounts() or the --count flag in viper -run.
Env VIPER_INTERRUPT_EVERY_N: periodically invoke a host callback every N instructions (see RunConfig::interruptEveryN).

Switch Backend Heuristics

Switch dispatch selects a backend per instruction. Heuristics can be tuned via env:

VIPER_SWITCH_DENSE_MAX_RANGE (default 4096): maximum value range to consider a dense jump table.
VIPER_SWITCH_DENSE_MIN_DENSITY (default 0.60): minimum case density for dense backend.
VIPER_SWITCH_HASH_MIN_CASES (default 64): minimum number of cases before hashing is considered.
VIPER_SWITCH_HASH_MAX_DENSITY (default 0.15): maximum density to prefer hashed backend.

If VIPER_SWITCH_MODE is set to dense|sorted|hashed|linear|auto, it overrides the heuristic for all instructions.

Benchmarking

Use the helper script to compare dispatch performance across modes:

Script: scripts/vm_benchmark.sh

Environment variables:

IL_DIR (default examples/il/benchmarks): directory of IL programs to benchmark (relative to repo root).
ILC_BIN: optional path to viper; otherwise auto-detected under build/.
RUNS_PER_CASE (default 5): number of runs per (mode, program) pair.

Each invocation writes a timestamped section header and a per-row timestamp, along with averages and min/max timings, the actual dispatch kind, and instruction counts extracted from --count and --time summaries.

Example:

RUNS_PER_CASE=5 IL_DIR='src/tests/il/e2e' scripts/vm_benchmark.sh

The script sets VIPER_DEBUG_VM=1 so the VM prints the resolved dispatch kind, and VIPER_ENABLE_OPCOUNTS=1 to capture counts.

Appendix: Concurrency Model

Each VM instance is single‑threaded: only one host thread may execute within a given VM instance at a time. To parallelize at the embedder level, create one VM per host thread (each VM has its own program state).

For language-level shared-memory threads (Viper.Threads), the VM spawns a new host thread and runs a new VM instance that shares a single VM::ProgramState (shared globals + shared RtContext) with its parent. This preserves the “one host thread per VM instance” invariant while allowing a Viper program to share memory across its threads.

The active VM is tracked via a thread‑local guard (see ActiveVMGuard in src/vm/VMContext.*), which binds the VM and its runtime context for the duration of execution. In debug builds, attempting to activate a different VM while one is already active on the same thread triggers an assertion.

Appendix: Runtime ABI Reference

Extern symbols in IL map to C functions declared in src/runtime/rt.hpp. This section documents the core ABI surface available to both the VM and native backends. For the complete list see the Runtime Library Reference.

Runtime symbol naming

Canonical entry points use dotted Viper.* names emitted by frontends (catalogued in src/il/runtime/RuntimeSignatures.hpp).
Native backends rewrite these to C symbols via il::runtime::mapCanonicalRuntimeName and the alias table in src/il/runtime/RuntimeNameMap.hpp.
When built with -DVIPER_RUNTIME_NS_DUAL=ON, legacy @rt_* externs are accepted as aliases of @Viper.*.

Math

Symbol	Signature	Semantics
`@rt_sqrt`	`f64 -> f64`	square root
`@rt_floor`	`f64 -> f64`	floor
`@rt_ceil`	`f64 -> f64`	ceiling
`@rt_sin`	`f64 -> f64`	sine
`@rt_cos`	`f64 -> f64`	cosine
`@rt_pow_f64_chkdom`	`f64, f64 -> f64`	power
`@rt_abs_i64`	`i64 -> i64`	absolute value (integer, traps on overflow)
`@rt_abs_f64`	`f64 -> f64`	absolute value (float)

String operations

Symbol	Signature	Semantics
`@rt_str_len`	`str -> i64`	Return length of string in bytes
`@rt_str_concat`	`str, str -> str`	Concatenate two strings; consumes both operands
`@rt_str_substr`	`str, i64, i64 -> str`	Extract substring (0-based start, length)
`@rt_str_left`	`str, i64 -> str`	Leftmost n characters
`@rt_str_right`	`str, i64 -> str`	Rightmost n characters
`@rt_str_mid`	`str, i64 -> str`	Substring from start (0-based) to end
`@rt_str_mid_len`	`str, i64, i64 -> str`	Substring from start (0-based) with length
`@rt_str_index_of`	`str, str -> i64`	Find needle; returns 1-based index or 0
`@rt_str_trim`	`str -> str`	Remove leading and trailing whitespace
`@rt_str_ucase`	`str -> str`	Convert ASCII to uppercase
`@rt_str_lcase`	`str -> str`	Convert ASCII to lowercase
`@rt_str_chr`	`i64 -> str`	Single-character string from ASCII code
`@rt_str_asc`	`str -> i64`	ASCII code of first character
`@rt_str_eq`	`str, str -> i1`	Compare two strings for equality

Console I/O

Symbol	Signature	Semantics
`@rt_print_str`	`str -> void`	Print string to stdout
`@rt_print_i64`	`i64 -> void`	Print 64-bit integer to stdout
`@rt_print_f64`	`f64 -> void`	Print float to stdout
`@rt_input_line`	`void -> str`	Read a line from stdin

Conversion

Symbol	Signature	Semantics
`@rt_to_int`	`str -> i64`	Parse decimal integer from string
`@rt_to_double`	`str -> f64`	Parse floating-point from string
`@rt_int_to_str`	`i64 -> str`	Convert integer to decimal string
`@rt_f64_to_str`	`f64 -> str`	Convert float to decimal string
`@rt_val`	`str -> f64`	Parse leading numeric prefix

Terminal control

Symbol	Signature	Semantics
`@rt_term_cls`	`void -> void`	Clear screen and home cursor
`@rt_term_color_i32`	`i32, i32 -> void`	Set foreground/background
`@rt_term_locate_i32`	`i32, i32 -> void`	Move cursor (1-based row, col)
`@rt_term_cursor_visible_i32`	`i32 -> void`	Show/hide cursor

Time & random

Symbol	Signature	Semantics
`@rt_timer_ms`	`void -> i64`	Monotonic millisecond timestamp
`@rt_rnd`	`void -> f64`	Random f64 in [0,1)
`@rt_randomize_i64`	`i64 -> void`	Seed RNG

Environment

Symbol	Signature	Semantics
`@rt_args_count`	`void -> i64`	Number of program arguments
`@rt_args_get`	`i64 -> str`	Program argument at zero-based index
`@rt_env_is_native`	`void -> i1`	1 for native binary, 0 for VM

FilesExpand file tree

vm.md

Latest commit

History