Skip to content

Latest commit

 

History

History
31 lines (22 loc) · 2.16 KB

File metadata and controls

31 lines (22 loc) · 2.16 KB

Compiler for a custom language implemented in Rust (ongoing)

This project aims to implement a compiler for my own language from the ground up for x86_64 machines running a Linux-based distro.

That includes implementing:

  • the lexical analysis (lexer), syntax analysis (parsing), semantic analysis
  • the assembler
  • the linker

Objectives

My main goals regarding this project are:

  • Deepening my understanding of the x86_64 architecture, including the instruction set (ISA), CPU registers, memory management, and low-level execution flow.
  • Exploring Linux OS internals, specifically how the operating system handles system calls, manages memory, and interacts with compiled machine code.
  • Understanding the Executable and Linkable Format (ELF) by generating valid executable headers, data/text sections, and segments entirely from scratch.
  • Understanding the structure of object files, including how symbol tables are built, how relocation entries work, and the exact mechanics of linking multiple files together.
  • Learning Rust in a low-level systems programming context, utilizing its famous memory safety techniques for building compiler infrastructure.

Current state and workflow

I chose to build the project in reverse order, from its lowest level to the highest. Thus, the first step was building an assembler aimed for the x86_64 Intel syntax.

The assembler aims to generate the binary code for a custom object file format. I deliberately chose to use a simplified object format first, rather than jumping straight into the highly complex ELF standard. This approach allows me to isolate and deeply understand the core mechanics of machine code generation and object file structuring.

Once the assembler is fully functional, the next immediate milestone is building the linker. The linker will be responsible for:

  • Parsing these custom object files.
  • Resolving external and internal symbols.
  • Performing all necessary memory relocations.
  • Generating and appending the correct execution headers.

The format of the compiled language is yet to be decided, currently I am working on implementing the working assembler + linker.