← Language Guide | API Reference | Architecture | Built-ins →
How the compiler is structured. Mostly notes for myself but might be useful if you're poking around.
Source → Lexer → Tokens → Parser → AST → Type Checker → Interpreter or C Codegen → GCC/Clang → Binary
The type checker produces a CheckResult with type info, ownership tracking, and warnings. Both the interpreter and codegen consume this result.
Breaks source code into tokens. Pretty standard lexer - handles keywords, operators, literals, etc.
Key files:
lexer.go- the actual lexertoken.go- token types and keyword lookup
Abstract Syntax Tree definitions. Every syntactic construct has a corresponding AST node.
Key files:
ast.go- expressions (literals, operators, calls, etc.)statements.go- statements (let, return, if, for, etc.)types.go- type expressions
Pratt parser (operator precedence parsing) for expressions, recursive descent for statements.
Key files:
parser.go- parser setup + statement parsingpratt.go- expression parsing and precedence logicparser_decls.go- class/interface/impl parsingparser_types.go- type expression parsing
Type checker. Walks the AST and validates types, builds symbol tables, tracks ownership.
Produces a CheckResult with:
NodeTypes: type of every expressionFuncSigs: function signaturesClassInfo: class field/method infoErrors: type errors (fatal in codegen)Warnings: ownership/borrow violations (warnings in interpreter, fatal in codegen)
Implements ownership tracking (move/drop), borrow checking (&T / &mut T), and a warnings system for non-fatal violations.
Key files:
checker.go- checker core + diagnosticsownership.go- move/ownership rulesborrow.go- borrow state and borrow checksinterface.go- interface + impl validationasync.go- async/await validation
Tree-walking interpreter. Useful for quick iteration and testing without going through the C compilation step.
Key files:
eval.go- dispatcher and core statement evaleval_*.go- expression/operation evaluators split by concernobject.go- runtime value typesbuiltins.go- built-in functionsenvironment.go- variable scoping
Generates C code from the AST. The generated C is not pretty but it works.
Currently targets C99. Key features:
- Scope stack: tracks variable lifetimes for drop insertion
- Preamble buffer: emits runtime helpers (carv_string, carv_array, etc.)
- carv_string struct:
{char* data; size_t len; bool owned;} - Single-exit functions: all returns become
goto __carv_exitwith drops at exit label - Ownership-aware code generation: emits
carv_string_move(),carv_string_drop(),carv_string_clone() - Borrow support:
&T→const T*,&mut T→T* - Interface dispatch: vtable-based dynamic dispatch via fat pointers
- Arena allocator: used for all owned heap values
- Async/await lowering:
async fnto frame structs + poll state machines - Async runtime bootstrap: generated
main()drivesasync fn carv_main()via event loop
Interfaces compile to a vtable + fat pointer pattern:
- Vtable struct: one function pointer per interface method, all taking
const void* selfas first param - Fat pointer:
{ const void* data; const Vtable* vt; }—_ref(immutable) and_mut_ref(mutable) variants - Impl wrappers: static functions that cast
const void*back to the concrete type and call the real method - Vtable instances: one
static constvtable per impl, initialized with wrapper function pointers - Cast expressions:
&obj as &Interfaceproduces a fat pointer literal{ .data = obj, .vt = &VT } - Dynamic dispatch:
obj.method(args)on an interface ref becomesobj.vt->method(obj.data, args)
Generation order: interface typedefs → impl forward decls → impl bodies → wrappers + vtable instances (all before main())
Module system for loading and resolving dependencies.
Key files:
loader.go- module resolution and loadingconfig.go-carv.tomlparsing
Supports:
- Relative imports (
./utils,../lib/math) - Project-local imports (from
src/directory) - Built-in standard modules (
net,web) - Future: external packages (from
carv_modules/)
CLI entry point. Handles run, build, emit-c, repl, and init commands.
Why compile to C?
Portability mostly. C compilers exist everywhere, and I get optimization for free. Plus it's interesting to see how high-level constructs map to C.
Why a tree-walking interpreter too?
Much faster feedback loop during development. Compiling to C means invoking GCC which is slow for quick tests.
Why semicolons?
Easier to parse. Maybe I'll add automatic semicolon insertion later, but for now explicit semis keep the parser simple.
The goal is self-hosting - writing the Carv compiler in Carv. That means I need:
Module/import system✓ Done!String interpolation✓ Done!Ownership system (move + drop)✓ Done!Borrowing (&T / &mut T)✓ Done!Interfaces (interface/impl)✓ Done!Async/await✓ Done!- Package manager (for external dependencies)
- Better standard library
- Then rewrite lexer, parser, codegen in Carv
It's a long road but that's half the fun. Getting closer though!
← Language Guide | API Reference | Architecture | Built-ins →