Skip to content

Commit aa093e4

Browse files
committed
docs: add missing translations, atomic API reference, and gitignore fixes
- docs/en/elf.md: English translation of docs/pt-BR/elf.md (ELF loading internals — ELF header, program headers, section headers, symbol table, memory layout, and readelf/objdump inspection guide) - docs/pt-BR/pipeline.md: Portuguese translation of docs/en/pipeline.md (5-stage pipeline, hazards, forwarding model, branch prediction, FU execution model, cache interaction) - docs/pt-BR/cache-config.md: Portuguese translation of docs/en/cache-config.md (.fcache format, all fields, validation rules, L1/L2/L3 examples) - rust-to-raven/API.md: added Atomic Types section documenting AtomicU32, AtomicBool, AtomicI32, AtomicUsize, Arc<T>, and Ordering — all backed by RV32A instructions (lr.w/sc.w/AMOs) - .gitignore: add .codex; fix duplicate CHANGELOG.mdCHANGELOG.md entry
1 parent fd73bca commit aa093e4

5 files changed

Lines changed: 1042 additions & 1 deletion

File tree

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,5 @@
1818
/docs/threads-plan.md
1919
/hello-raven/target
2020
rust-to-raven/rust-to-raven.elf
21-
CHANGELOG.mdCHANGELOG.md
21+
CHANGELOG.md
22+
.codex

docs/en/elf.md

Lines changed: 315 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,315 @@
1+
# How Falcon loads an ELF file
2+
3+
> Educational document — assumes you know nothing about ELF.
4+
5+
---
6+
7+
## 1. What is an ELF file?
8+
9+
When you compile a C or Rust program for Linux (or for a bare-metal target like `riscv32im-unknown-none-elf`), the result is not just "the instruction bytes". The compiler produces an **ELF** (*Executable and Linkable Format*) file — a structured container that tells the operating system (or the simulator) *where* to place each piece of the program in memory, *which address* is the entry point, and much more.
10+
11+
Think of an ELF file as a chest with drawers:
12+
13+
```
14+
┌─────────────────────────────────┐
15+
│ ELF Header │ ← "map" of the chest: where each drawer is
16+
├─────────────────────────────────┤
17+
│ Program Headers (Segments) │ ← what to load into memory and where
18+
├─────────────────────────────────┤
19+
│ .text (code) │ ← instruction bytes
20+
│ .rodata (constants) │
21+
│ .data (init variables) │
22+
│ .bss (zero variables) │ ← exists in the header but takes no bytes in the file
23+
├─────────────────────────────────┤
24+
│ Section Headers │ ← metadata for the linker/debugger
25+
│ .symtab (symbols) │
26+
│ .strtab (names) │
27+
└─────────────────────────────────┘
28+
```
29+
30+
**Falcon** (the Raven simulation core) only needs two things to execute a program:
31+
32+
1. Copy the right bytes to the right addresses in simulated RAM.
33+
2. Know which address to start executing at (the *entry point*).
34+
35+
Sections (`Section Headers`) are extra information — useful for a debugger, but not required to run.
36+
37+
---
38+
39+
## 2. The ELF Header — the first 52 bytes
40+
41+
Every ELF file starts with a fixed-size header. In ELF32 (32-bit) it is **52 bytes**. Falcon reads it like this:
42+
43+
```
44+
Offset Size Field Expected value / use
45+
────── ──── ─────────────── ────────────────────────────────────────────
46+
0 4 e_ident[magic] 7f 45 4c 46 ← "\x7fELF" in ASCII
47+
4 1 EI_CLASS 1 ← ELF32 (2 would be ELF64)
48+
5 1 EI_DATA 1 ← little-endian
49+
18 2 e_machine 0xF3 ← RISC-V (243 decimal)
50+
24 4 e_entry e.g. 0x110d4 ← address of _start
51+
28 4 e_phoff offset of Program Headers in the file
52+
42 2 e_phentsize size of each Program Header (≥ 32)
53+
44 2 e_phnum how many Program Headers exist
54+
32 4 e_shoff offset of Section Headers in the file
55+
46 2 e_shentsize size of each Section Header (≥ 40)
56+
48 2 e_shnum how many Section Headers exist
57+
50 2 e_shstrndx index of the section containing section names
58+
```
59+
60+
In the code (`src/falcon/program/elf.rs`):
61+
62+
```rust
63+
let e_entry = u32le(24); // where to start executing
64+
let e_phoff = u32le(28); // where in the file the Program Headers are
65+
let e_phentsize = u16le(42); // size of each entry
66+
let e_phnum = u16le(44); // how many entries
67+
68+
let e_shoff = u32le(32); // where Section Headers are
69+
let e_shentsize = u16le(46);
70+
let e_shnum = u16le(48);
71+
let e_shstrndx = u16le(50); // which section contains names
72+
```
73+
74+
If the magic is wrong, the class is not 1 (ELF32), or the machine is not `0xF3` (RISC-V), Falcon rejects the file with an error.
75+
76+
---
77+
78+
## 3. Program Headers — what goes into memory
79+
80+
**Program Headers** (also called *segments*) describe what the loader needs to do. Each entry has **32+ bytes**:
81+
82+
```
83+
Offset Size Field Meaning
84+
────── ──── ───────── ────────────────────────────────────────────────
85+
0 4 p_type segment type (1 = PT_LOAD = "load this")
86+
4 4 p_offset where in the *file* this segment starts
87+
8 4 p_vaddr virtual address in *memory* where it should go
88+
16 4 p_filesz how many bytes to copy from the file to memory
89+
20 4 p_memsz total size in memory (can be > filesz → BSS)
90+
24 4 p_flags permissions: bit 0 = executable (X), bit 1 = W, bit 2 = R
91+
```
92+
93+
Falcon only cares about segments of type **PT_LOAD** (type 1). For each one:
94+
95+
```rust
96+
if p_type != PT_LOAD { continue; } // skip PT_DYNAMIC, PT_NOTE, etc.
97+
98+
// 1. Copy p_filesz bytes from the file to p_vaddr in RAM
99+
load_bytes(mem, p_vaddr, &bytes[p_offset .. p_offset + p_filesz]);
100+
101+
// 2. If p_memsz > p_filesz, zero the rest (this is BSS!)
102+
if p_memsz > p_filesz {
103+
zero_bytes(mem, p_vaddr + p_filesz, p_memsz - p_filesz);
104+
}
105+
```
106+
107+
### BSS in practice
108+
109+
BSS contains global variables initialized to zero. Since "zero" does not need to be stored in the file, the linker sets `p_filesz < p_memsz` — the difference is the number of bytes to zero. Raven does this automatically, so the `_start` of programs no longer needs to zero BSS manually.
110+
111+
### Identifying .text vs .data
112+
113+
```rust
114+
if p_flags & PF_X != 0 {
115+
// has executable bit → this is the code segment (.text)
116+
text_bytes = ...;
117+
text_base = p_vaddr;
118+
} else {
119+
// not executable → this is the data segment (.data/.bss)
120+
data_base = p_vaddr;
121+
}
122+
```
123+
124+
A typical RISC-V ELF has two PT_LOAD segments:
125+
126+
```
127+
PT_LOAD #1 flags=R+X → .text + .rodata (code and constants)
128+
PT_LOAD #2 flags=R+W → .data + .bss (variables)
129+
```
130+
131+
### Heap start
132+
133+
After loading all segments, Falcon calculates where the heap begins:
134+
135+
```rust
136+
let end = p_vaddr + p_memsz; // end of this segment in memory
137+
if end > seg_end_max { seg_end_max = end; }
138+
139+
// after the loop:
140+
let heap_start = (seg_end_max + 15) & !15; // align up to 16 bytes
141+
```
142+
143+
This value is stored in `cpu.heap_break` and is what the `brk` syscall returns on the first call.
144+
145+
---
146+
147+
## 4. Section Headers — metadata
148+
149+
**Section Headers** are not required to execute, but contain valuable information. Falcon reads them to extract the symbol table.
150+
151+
Each Section Header has **40 bytes**:
152+
153+
```
154+
Offset Size Field Meaning
155+
────── ──── ────────── ──────────────────────────────────────────
156+
0 4 sh_name index in shstrtab (where the section name is)
157+
4 4 sh_type type: 2=SHT_SYMTAB, 3=SHT_STRTAB, 8=SHT_NOBITS
158+
12 4 sh_addr virtual address (0 if not loaded into memory)
159+
16 4 sh_offset where in the file the section bytes are
160+
20 4 sh_size how many bytes
161+
24 4 sh_link for .symtab: index of the associated .strtab
162+
36 4 sh_entsize size of each entry (16 for .symtab)
163+
```
164+
165+
### shstrtab — the name directory
166+
167+
To find the name of a section, the `sh_name` field is an **offset** into a special section called **shstrtab** (*section header string table*). The index of that section is in `e_shstrndx` in the ELF header.
168+
169+
```
170+
shstrtab (bytes): \0.text\0.data\0.bss\0.symtab\0...
171+
^ ^ ^
172+
0 1 12 ← sh_name = 12 → name = ".bss"
173+
```
174+
175+
---
176+
177+
## 5. The symbol table — how names appear in the disassembly
178+
179+
The `.symtab` section (type `SHT_SYMTAB`) contains a list of symbols — functions, global variables, labels. Each entry has **16 bytes**:
180+
181+
```
182+
Offset Size Field Meaning
183+
────── ──── ──────── ────────────────────────────────────────────────
184+
0 4 st_name offset of the name in the associated .strtab
185+
4 4 st_value virtual address of the symbol
186+
8 4 st_size size in bytes (0 if unknown)
187+
12 1 st_info type in the low 4 bits: 1=OBJECT, 2=FUNC
188+
13 1 st_other visibility (ignored)
189+
14 2 st_shndx which section the symbol lives in
190+
```
191+
192+
Falcon filters the symbols that matter:
193+
194+
```rust
195+
let sym_type = st_info & 0x0F;
196+
if sym_type != STT_FUNC && sym_type != STT_OBJECT { continue; } // only functions and variables
197+
if st_value == 0 { continue; } // symbol without address (external/undef)
198+
if name.is_empty() || name.starts_with('$') { continue; } // linker-internal names
199+
```
200+
201+
The result goes into `run.labels: HashMap<u32, Vec<String>>` — the same map the assembler populates when you write a label in ASM. That's why the disassembly shows `<main>:`, `<factorial>:` etc. when loading a compiled ELF.
202+
203+
---
204+
205+
## 6. Memory layout in Raven
206+
207+
After loading a typical ELF from `hello-raven`, the 128 KB memory looks like this:
208+
209+
```
210+
Address Contents
211+
────────────── ──────────────────────────────────────────────────
212+
0x00000000 (empty — trap if executed)
213+
...
214+
0x00010000 .rodata + .data (PT_LOAD #1, flags=R)
215+
.bss (zeroed because p_memsz > p_filesz)
216+
── heap_break ──────────────────────────────────────────────────── ← cpu.heap_break
217+
heap (grows upward via brk)
218+
... (free space)
219+
...
220+
0x0001FFFC stack (grows downward; sp = 0x20000 at init)
221+
0x00020000 (end of RAM — 128 KB)
222+
```
223+
224+
The `.text` code segment goes to a separate address because the default RISC-V linker places rodata/data at `0x10000` and `.text` at `0x110xx` (right after, on another page).
225+
226+
---
227+
228+
## 7. Complete flow summary
229+
230+
```
231+
ELF file
232+
233+
234+
load_elf(bytes, mem)
235+
236+
├─ validate magic, class, data, machine
237+
238+
├─ read ELF Header → entry, phoff, shoff, ...
239+
240+
├─ loop PT_LOAD segments
241+
│ ├─ copy p_filesz bytes → mem[p_vaddr]
242+
│ ├─ zero (p_memsz - p_filesz) bytes → BSS
243+
│ ├─ identify .text (PF_X) and .data
244+
│ └─ update seg_end_max
245+
246+
├─ heap_start = align16(seg_end_max)
247+
248+
├─ parse_sections() ─────────────────────────────────────┐
249+
│ ├─ read all Section Headers │
250+
│ ├─ find shstrtab (section names) │
251+
│ ├─ find .symtab → iterate symbols │
252+
│ │ └─ filter FUNC/OBJECT → symbols HashMap │
253+
│ └─ collect .data/.rodata/.bss → sections Vec │
254+
│ │
255+
└─ return ElfInfo ◄───────────────────────────────────────┘
256+
├─ entry, text_base, text_bytes, data_base
257+
├─ total_bytes, heap_start
258+
├─ symbols → run.labels (appear in disassembly)
259+
└─ sections → run.elf_sections (side panel)
260+
```
261+
262+
---
263+
264+
## 8. Inspecting an ELF yourself
265+
266+
You have the `hello-raven` binary at `hello-raven/target/riscv32im-unknown-none-elf/release/hello-raven`. To see what Falcon will read:
267+
268+
```bash
269+
# Full ELF header
270+
readelf -h hello-raven
271+
272+
# Program headers (segments the loader uses)
273+
readelf -l hello-raven
274+
275+
# Section headers (metadata)
276+
readelf -S hello-raven
277+
278+
# Symbol table (what becomes a label in the disassembly)
279+
readelf -s hello-raven
280+
281+
# Disassembly with labels
282+
objdump -d hello-raven
283+
```
284+
285+
Example output from `readelf -l`:
286+
287+
```
288+
Program Headers:
289+
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
290+
LOAD 0x001000 0x00010000 0x00010000 0x00050 0x00060 RW 0x1000
291+
LOAD 0x002000 0x000110d4 0x000110d4 0x00400 0x00400 R E 0x1000
292+
↑ ↑ ↑ ↑ ↑
293+
in file in memory bytes bytes R=read
294+
copied total W=write
295+
E=exec
296+
```
297+
298+
The second segment has `FileSiz == MemSiz` (no BSS), flags `R E` (read + execute) → this is the `.text` that Falcon identifies as code.
299+
300+
---
301+
302+
## Quick reference — magic numbers
303+
304+
| Constant | Value | Meaning |
305+
|---------------|--------|------------------------------------|
306+
| `PT_LOAD` | 1 | segment to load into memory |
307+
| `PF_X` | 1 | executable flag in p_flags |
308+
| `SHT_SYMTAB` | 2 | section is a symbol table |
309+
| `SHT_STRTAB` | 3 | section is a string table |
310+
| `SHT_NOBITS` | 8 | section with no bytes in file (BSS)|
311+
| `STT_OBJECT` | 1 | symbol is a variable/data |
312+
| `STT_FUNC` | 2 | symbol is a function |
313+
| `EM_RISCV` | 0xF3 | e_machine for RISC-V |
314+
| `ELFCLASS32` | 1 | EI_CLASS for 32-bit |
315+
| `ELFDATA2LSB` | 1 | EI_DATA for little-endian |

0 commit comments

Comments
 (0)