From 9cf8e0bff4080a3f5e5fe2bd670fe785b85763c2 Mon Sep 17 00:00:00 2001 From: Tristan Ross Date: Mon, 6 Apr 2026 21:43:23 -0700 Subject: [PATCH] chore: init docs --- docs/arch/README.md | 96 ++++++++++++++++++++++++++ docs/arch/bram.md | 57 +++++++++++++++ docs/arch/clb.md | 89 ++++++++++++++++++++++++ docs/arch/clock.md | 72 +++++++++++++++++++ docs/arch/configuration.md | 108 +++++++++++++++++++++++++++++ docs/arch/dsp.md | 57 +++++++++++++++ docs/arch/io.md | 57 +++++++++++++++ docs/arch/pdk.md | 137 +++++++++++++++++++++++++++++++++++++ docs/arch/routing.md | 110 +++++++++++++++++++++++++++++ docs/arch/serdes.md | 73 ++++++++++++++++++++ 10 files changed, 856 insertions(+) create mode 100644 docs/arch/README.md create mode 100644 docs/arch/bram.md create mode 100644 docs/arch/clb.md create mode 100644 docs/arch/clock.md create mode 100644 docs/arch/configuration.md create mode 100644 docs/arch/dsp.md create mode 100644 docs/arch/io.md create mode 100644 docs/arch/pdk.md create mode 100644 docs/arch/routing.md create mode 100644 docs/arch/serdes.md diff --git a/docs/arch/README.md b/docs/arch/README.md new file mode 100644 index 0000000..e3b52ba --- /dev/null +++ b/docs/arch/README.md @@ -0,0 +1,96 @@ +# Aegis Architecture Overview + +Aegis is a parameterized FPGA fabric generator written in Dart using the +ROHD hardware description framework. It outputs synthesizable SystemVerilog +for the entire FPGA, from logic tiles to I/O pads to the configuration +chain. This document describes the silicon architecture: how the fabric is +structured, how tiles connect, and how a bitstream programs the device. + +## Device Hierarchy + +An Aegis device is organized as a layered hierarchy: + +```mermaid +graph TD + FPGA[AegisFPGA] + FPGA --> Loader[FabricConfigLoader] + FPGA --> Clock["ClockTile[0..N]"] + FPGA --> IO[IOFabric] + IO --> IOTile["IOTile[0..P]"] + IO --> SerDes["SerDesTile[0..S]"] + FPGA --> Fabric[LutFabric] + Fabric --> Tiles["Tile[x][y] (LUT, BRAM, or DSP)"] +``` + +The `LutFabric` is a rectangular grid of tiles. Each tile contains a +configurable logic block (CLB) and a routing switchbox. Specialized columns +replace standard LUT tiles with BRAM or DSP tiles at regular intervals. + +The `IOFabric` wraps the grid perimeter with I/O pads and SerDes +transceivers. Clock tiles sit outside the fabric and distribute divided +clocks to all tiles. + +## Fabric Grid Layout + +The grid is `width x height` tiles. Columns are specialized based on +their index: + +- **BRAM columns**: placed at every `bramColumnInterval` columns +- **DSP columns**: placed at every `dspColumnInterval` columns, skipping + BRAM positions +- **LUT columns**: all remaining columns + +For the Terra 1 device (48x64, bramColumnInterval=16, dspColumnInterval=24): + +| Column Type | Count | Tiles per Column | Total Tiles | +|-------------|-------|------------------|-------------| +| LUT | 45 | 64 | 2,880 | +| BRAM | 2 | 64 | 128 | +| DSP | 1 | 64 | 64 | + +## Carry Chains + +Each column has a vertical carry chain running south to north. The bottom +tile in each column receives `carryIn = 0`, and each tile's `carryOut` +feeds the `carryIn` of the tile above it. This enables fast arithmetic +(adders, counters) without routing through the switchbox. + +BRAM tiles pass the carry signal through unchanged. + +## Edge I/O + +The fabric's four edges aggregate tile outputs using wired-OR. Any tile +on an edge can drive the corresponding external output. I/O pads on the +perimeter connect to these edge signals: + +- North edge: `width` pads (left to right) +- East edge: `height` pads (top to bottom) +- South edge: `width` pads (left to right) +- West edge: `height` pads (top to bottom) + +Total pads = `2 * width + 2 * height` (224 for Terra 1). + +## Configuration + +The entire device is programmed through a single serial shift register +chain. Bits are shifted in through the clock tiles, then through I/O +tiles, then SerDes tiles, and finally through the fabric tiles in +row-major order. A `cfgLoad` pulse transfers the shift register contents +to the active configuration registers in parallel. + +See [Configuration Chain](configuration.md) for the full protocol. + +## Tile Documentation + +- [CLB (Configurable Logic Block)](clb.md) +- [Routing](routing.md) +- [BRAM (Block RAM)](bram.md) +- [DSP (Digital Signal Processing)](dsp.md) +- [I/O Pad](io.md) +- [SerDes](serdes.md) +- [Clock Tile](clock.md) +- [Configuration Chain](configuration.md) + +## Other Documentation + +- [PDK Integration and Tapeout](pdk.md) diff --git a/docs/arch/bram.md b/docs/arch/bram.md new file mode 100644 index 0000000..38cddbf --- /dev/null +++ b/docs/arch/bram.md @@ -0,0 +1,57 @@ +# Block RAM (BRAM) Tile + +BRAM tiles provide on-chip memory distributed across the fabric in +dedicated columns. Each BRAM tile implements a dual-port synchronous RAM +that can be read and written independently from two directions. + +## Parameters + +| Parameter | Default | Description | +|--------------|---------|-------------------------------| +| Data width | 8 bits | Width of each memory word | +| Address width| 7 bits | Address bus width | +| Depth | 128 | Number of words (2^addrWidth) | + +## Ports + +The two ports are mapped to the tile's directional routing: + +- **Port A**: input from the north, output to the south +- **Port B**: input from the west, output to the east + +Data, address, and write-enable signals are packed into the routing +tracks. The packing adapts to the available track width: + +``` +[effAddrWidth-1 : 0] Address bits +[effAddrWidth+effDataWidth-1 : effAddrWidth] Data bits +[effAddrWidth+effDataWidth] Write-enable (if tracks allow) +``` + +If the track width is narrower than the full address + data width, the +signals are truncated and zero-extended. + +## Read/Write Behavior + +**Writes** are synchronous. On the rising clock edge, if the port is +enabled and write-enable is asserted, the data word is stored at the +given address. Both ports can write simultaneously (true dual-port). + +**Reads** are asynchronous (combinational). When a port is enabled, the +data at the addressed location is continuously driven onto the output +tracks. When disabled, the output is zero. + +## Carry Chain + +BRAM tiles pass the carry signal through unchanged (`carryOut = carryIn`). +They do not consume or generate carry values. + +## Configuration + +| Bit | Field | +|--------|---------------| +| `[0]` | Port A enable | +| `[1]` | Port B enable | +| `[7:2]`| Reserved | + +**Total: 8 bits** diff --git a/docs/arch/clb.md b/docs/arch/clb.md new file mode 100644 index 0000000..81548d3 --- /dev/null +++ b/docs/arch/clb.md @@ -0,0 +1,89 @@ +# Configurable Logic Block (CLB) + +The CLB is the fundamental logic element of the Aegis fabric. Each CLB +contains a 4-input lookup table (LUT4), a D flip-flop, and a carry chain +multiplexer (MUXCY). Together these implement arbitrary 4-input +combinational logic with optional registering and fast arithmetic. + +## Block Diagram + +```mermaid +graph TD + in0 --> LUT4["LUT4 (16-bit truth table)"] + in1 --> LUT4 + in2 --> LUT4 + in3 --> LUT4 + LUT4 -- "lutOut" --> OutMux["Output Mux (cfg 16)"] + LUT4 -- "lutOut" --> FF["D Flip-Flop"] + clk --> FF + FF -- "ffOut" --> OutMux + OutMux --> out + LUT4 -- "propagate (P)" --> MUXCY["MUXCY (cfg 17)"] + carryIn --> MUXCY + in0 -- "generate" --> MUXCY + MUXCY --> carryOut +``` + +## LUT4 + +The LUT4 implements any Boolean function of four inputs. It stores a +16-bit truth table in configuration bits `[15:0]`. The output is selected +by using the four inputs as an index into the truth table. + +Internally, the LUT is a 4-stage multiplexer tree: + +| Stage | Select | Muxes | Operation | +|-------|--------|-------|----------------------------------------| +| 0 | in0 | 8 | `s0[i] = mux(in0, cfg[2i+1], cfg[2i])` | +| 1 | in1 | 4 | `s1[i] = mux(in1, s0[2i+1], s0[2i])` | +| 2 | in2 | 2 | `s2[i] = mux(in2, s1[2i+1], s1[2i])` | +| 3 | in3 | 1 | `out = mux(in3, s2[1], s2[0])` | + +The effective computation is `out = cfg[{in3, in2, in1, in0}]`. + +### Common Truth Tables + +| Function | Truth Table | Notes | +|--------------|-------------|------------------------------| +| 2-input AND | `0x8888` | on in0, in1 | +| 2-input OR | `0xEEEE` | on in0, in1 | +| 2-input XOR | `0x6666` | also used as carry propagate | +| NOT | `0x5555` | inverts in0 | +| Constant 0 | `0x0000` | | +| Constant 1 | `0xFFFF` | | + +## D Flip-Flop + +The flip-flop captures the LUT output on the rising clock edge. Config +bit `[16]` selects whether the CLB output comes from the flip-flop +(registered) or directly from the LUT (combinational). + +- `cfg[16] = 0`: `out = LUT output` (combinational) +- `cfg[16] = 1`: `out = FF output` (registered) + +## Carry Chain (MUXCY) + +The carry chain enables fast arithmetic by bypassing the general routing +fabric. When carry mode is enabled (config bit `[17] = 1`): + +- The LUT output acts as the **propagate** signal (P) +- `carryOut = P ? carryIn : in0` +- `sum = P ^ carryIn` (fast XOR for adder sum bit) + +When carry mode is disabled (`cfg[17] = 0`): +- `carryOut = 0` +- The CLB output is the normal LUT/FF output + +Carry chains propagate vertically through a column (south to north), +enabling multi-bit adders and counters without consuming routing +resources. + +## Configuration Bit Layout + +| Bits | Width | Field | +|----------|-------|-------------| +| `[15:0]` | 16 | LUT truth table | +| `[16]` | 1 | FF enable (1 = registered) | +| `[17]` | 1 | Carry mode enable | + +**Total: 18 bits** diff --git a/docs/arch/clock.md b/docs/arch/clock.md new file mode 100644 index 0000000..c3691de --- /dev/null +++ b/docs/arch/clock.md @@ -0,0 +1,72 @@ +# Clock Tile + +Clock tiles generate divided clock signals from a reference clock and +distribute them to the fabric. Each clock tile provides four independent +outputs, each with its own divider, phase offset, and duty cycle control. + +## Outputs + +Each of the four outputs can be independently configured: + +| Feature | Range / Options | +|-------------|----------------------------------| +| Divider | 1 to 256 (8-bit, divides by N+1) | +| Phase | 0, 90, 180, or 270 degrees | +| Duty cycle | 50% toggle or single-cycle pulse | +| Enable | Per-output enable bit | + +## Divider + +Each output has an 8-bit counter that counts from 0 to the configured +divider value. This divides the reference clock frequency by +`(divider + 1)`, giving a range of divide-by-1 to divide-by-256. + +## Phase Control + +Phase offset shifts the output clock relative to the reference. The +offset is computed as a fraction of the divider period: + +| Phase Select | Offset Cycles | +|--------------|---------------------------| +| `00` (0) | 0 | +| `01` (90) | divider / 4 | +| `10` (180) | divider / 2 | +| `11` (270) | divider / 2 + divider / 4 | + +## Duty Cycle + +In **50% duty mode** (`duty = 1`), the output toggles at the midpoint of +each period, producing a symmetric square wave. + +In **pulse mode** (`duty = 0`), the output pulses high for one reference +clock cycle at the phase offset point and remains low otherwise. + +## Lock Indicator + +The `locked` output is asserted when all enabled clock outputs have +completed at least one full division cycle. This can be used for +synchronization or to gate downstream logic until clocks are stable. + +## Configuration + +| Bits | Field | +|-------------|---------------| +| `[0]` | Global enable | +| `[8:1]` | Divider 0 - 1 | +| `[16:9]` | Divider 1 - 1 | +| `[24:17]` | Divider 2 - 1 | +| `[32:25]` | Divider 3 - 1 | +| `[34:33]` | Phase 0 | +| `[36:35]` | Phase 1 | +| `[38:37]` | Phase 2 | +| `[40:39]` | Phase 3 | +| `[41]` | Enable 0 | +| `[42]` | Enable 1 | +| `[43]` | Enable 2 | +| `[44]` | Enable 3 | +| `[45]` | Duty 0 | +| `[46]` | Duty 1 | +| `[47]` | Duty 2 | +| `[48]` | Duty 3 | + +**Total: 49 bits** diff --git a/docs/arch/configuration.md b/docs/arch/configuration.md new file mode 100644 index 0000000..5b89582 --- /dev/null +++ b/docs/arch/configuration.md @@ -0,0 +1,108 @@ +# Configuration Chain + +The Aegis FPGA is programmed by shifting a bitstream through a single +serial chain that passes through every tile in the device. This document +describes the chain topology, the shift register protocol, and the +bitstream loading process. + +## Chain Topology + +The configuration chain is a single serial path. Bits enter at the first +clock tile and shift through every configurable tile in the device in +this order: + +1. **Clock tiles** (first) +2. **I/O tiles** (perimeter pads) +3. **SerDes tiles** +4. **Fabric tiles** (row-major: row 0 left to right, then row 1, etc.) + +Each tile's `cfgOut` connects to the next tile's `cfgIn`, forming a +continuous shift register across the entire device. + +```mermaid +graph LR + cfgIn --> C0["ClockTile[0]"] --> CN["ClockTile[N]"] + CN --> IO0["IOTile[0]"] --> IOP["IOTile[P]"] + IOP --> S0["SerDes[0]"] --> SS["SerDes[S]"] + SS --> T00["Tile[0,0]"] --> T10["Tile[1,0]"] --> TW0["Tile[W-1,0]"] + TW0 --> T01["Tile[0,1]"] --> TWH["Tile[W-1,H-1]"] --> cfgOut +``` + +## Per-Tile Shift Register + +Every tile (regardless of type) uses the same shift register mechanism: + +1. **Shift register** (`shiftReg`): on each clock edge, bits shift in + from `cfgIn` at the MSB and shift out from bit 0 to `cfgOut`. + + ``` + shiftReg <= {cfgIn, shiftReg[configWidth-1 : 1]} + cfgOut <= shiftReg[0] + ``` + +2. **Config register** (`configReg`): a parallel-load register that holds + the active configuration. It only updates when `cfgLoad` is asserted: + + ``` + if (cfgLoad) configReg <= shiftReg + ``` + +All tile logic reads from `configReg`, not from the shift register. This +means the tile's behavior does not change during shifting; it only updates +on the `cfgLoad` pulse. + +## Bitstream Loading Protocol + +The `FabricConfigLoader` module manages the loading process: + +### Phase 1: Memory Read + +The loader reads configuration words sequentially from an external memory +interface (`DataPortInterface`). It increments the word address each cycle +until all required words have been fetched. + +### Phase 2: Deserialization + +Read words are collected into a wide register. Once enough data has been +fetched, the loader flattens the words into a single bit array. + +### Phase 3: Serial Shift + +The loader shifts out one bit per clock cycle into the fabric's `cfgIn` +line. A bit counter tracks progress through the full bitstream. + +### Phase 4: Load Pulse + +After all bits have been shifted in, the loader pulses `cfgLoad`. All +tiles simultaneously transfer their shift register contents to their +config registers. The loader then asserts `done`. + +## Total Configuration Bits + +The total bitstream size depends on the device parameters: + +``` +totalBits = (clockTileCount * 49) + + (totalPads * 8) + + (serdesCount * 32) + + fabricConfigBits +``` + +For Terra 1 (48x64, tracks=4): + +| Section | Count | Bits Each | Total | +|--------------|-------|-----------|-------------| +| Clock tiles | 2 | 49 | 98 | +| I/O tiles | 224 | 8 | 1,792 | +| SerDes tiles | 4 | 32 | 128 | +| LUT tiles | 2,880 | 102 | 293,760 | +| BRAM tiles | 128 | 8 | 1,024 | +| DSP tiles | 64 | 16 | 1,024 | +| **Total** | | | **297,826** | + +## Reset Behavior + +On power-on reset, both the shift register and config register in every +tile are cleared to zero. In this default state, all logic is disabled: +LUTs output zero, routing is disconnected, and I/O pads are +high-impedance. diff --git a/docs/arch/dsp.md b/docs/arch/dsp.md new file mode 100644 index 0000000..aa946f3 --- /dev/null +++ b/docs/arch/dsp.md @@ -0,0 +1,57 @@ +# DSP Tile + +DSP tiles provide hardware multiply-accumulate (MAC) units distributed +across the fabric in dedicated columns. Each DSP tile performs an 18x18 +unsigned multiplication with optional accumulation or addition. + +## Operands + +| Operand | Width | Source | +|---------|--------|---------------------------------------------------------| +| A | 18 bits | North routing tracks `[17:0]` | +| B | 18 bits | West routing tracks `[17:0]` | +| C | varies | North tracks `[tracks-1:18]`, zero-extended to 36 bits | + +The result is a 36-bit value driven onto the south routing tracks. All +other directions output zero. + +## Operation Modes + +The DSP supports four modes, selected by config bits `[3:2]`: + +| Mode | Config `[3:2]` | Operation | +|------|----------------|--------------------------------| +| 0 | `00` | `result = A * B` | +| 1 | `01` | `result = A * B + C` | +| 2 | `10` | `result = A * B + accumulator` | +| 3 | `11` | Reserved (defaults to `A * B`) | + +Mode 1 adds an external constant (provided via the north tracks). Mode 2 +feeds the previous result back through the accumulator for iterative MAC +operations. + +## Pipeline Registers + +The DSP has two optional pipeline stages controlled by configuration: + +**Enable** (config bit `[0]`): gates both the accumulator and the output +register. When enabled, both registers latch the raw result on each +clock edge. When disabled, neither register updates. + +**Output register select** (config bit `[1]`): selects whether the south +output comes from the output register (registered) or directly from the +raw result (combinational). The register itself only updates when the +enable bit is set. + +The accumulator value is available as an operand in mode 2. + +## Configuration + +| Bit | Field | +|----------|------------------------------------------------| +| `[0]` | Enable (gates accumulator and output register) | +| `[1]` | Output register enable | +| `[3:2]` | Operation mode | +| `[15:4]` | Reserved | + +**Total: 16 bits** diff --git a/docs/arch/io.md b/docs/arch/io.md new file mode 100644 index 0000000..851aa71 --- /dev/null +++ b/docs/arch/io.md @@ -0,0 +1,57 @@ +# I/O Tile + +I/O tiles sit on the fabric perimeter and connect internal routing tracks +to external pads. Each tile interfaces one pad with the fabric and +supports input, output, bidirectional, and high-impedance modes. + +## Block Diagram + +```mermaid +graph TD + subgraph Output Path + Fabric["Fabric Tracks"] --> TrackMux["Track Mux (cfg 6:4)"] + TrackMux --> OutReg["Output Register (optional)"] + OutReg --> padOut + end + subgraph Input Path + padIn --> InReg["Input Register (optional)"] + InReg --> FabricOut["Fabric Tracks (broadcast)"] + end + DirCfg["Direction Mode (cfg 1:0)"] --> padOE["padOutputEnable"] +``` + +## Direction Modes + +Config bits `[1:0]` select the pad direction: + +| Value | Mode | Behavior | +|-------|---------------|---------------------------------------| +| `00` | High-Z | Pad isolated, fabric output is zero | +| `01` | Input | Pad drives fabric, pad output is zero | +| `10` | Output | Fabric drives pad, output enable high | +| `11` | Bidirectional | Both paths active | + +## Input Path + +When in input or bidirectional mode, the pad value is broadcast to all +fabric tracks. An optional input register (config bit `[2]`) adds a +pipeline stage clocked by the fabric clock. + +## Output Path + +When in output or bidirectional mode, one fabric track is selected by +config bits `[6:4]` (supporting up to 8 tracks). An optional output +register (config bit `[3]`) adds a pipeline stage. The selected value +drives `padOut`, and `padOutputEnable` is asserted. + +## Configuration + +| Bit | Field | +|----------|------------------------| +| `[1:0]` | Direction mode | +| `[2]` | Input register enable | +| `[3]` | Output register enable | +| `[6:4]` | Output track select | +| `[7]` | Reserved (pull-up) | + +**Total: 8 bits** diff --git a/docs/arch/pdk.md b/docs/arch/pdk.md new file mode 100644 index 0000000..eceadaf --- /dev/null +++ b/docs/arch/pdk.md @@ -0,0 +1,137 @@ +# PDK Integration and Tapeout + +Aegis uses a pluggable PDK (Process Design Kit) abstraction to support +multiple foundry targets. The digital FPGA fabric is described in +ROHD/Dart and synthesized to standard cells from the chosen PDK. Analog +peripherals (PLL, SerDes, I/O cells) are replaced with PDK-provided +symbols during tapeout. + +## Supported PDKs + +| PDK | Foundry | Node | Standard Cell Library | Site Name | +|----------|-----------------|-------|---------------------------|---------------------| +| GF180MCU | GlobalFoundries | 180nm | `gf180mcu_fd_sc_mcu7t5v0` | `GF018hv5v_mcu_sc7` | +| Sky130 | SkyWater | 130nm | `sky130_fd_sc_hd` | `unithd` | + +Both PDK packages are built as Nix derivations and expose a consistent +interface: standard cell liberty files, LEF files, GDS cell libraries, +and spice models. + +## PDK Provider Interface + +The `PdkProvider` abstract class defines how Aegis maps its canonical +block interfaces to PDK-specific symbols and pin names. Each provider +implements three methods: + +- `pll()`: returns symbol path and pin mapping for the clock PLL +- `serdes()`: returns symbol path and pin mapping for the serial transceiver +- `ioCell()`: returns symbol path and pin mapping for the I/O pad cell + +A provider registry allows selection by name (`generic`, `gf180mcu`). +The `generic` provider uses bundled Aegis symbols with identity pin +mapping for simulation and development. + +### Pin Mapping Example (GF180MCU) + +The GF180MCU provider translates Aegis pin names to foundry-specific +names: + +| Block | Aegis Pin | GF180MCU Pin | +|---------|-------------------|--------------| +| PLL | `refClk` | `CLK` | +| PLL | `reset` | `RST` | +| PLL | `clkOut[0]` | `CLKOUT0` | +| PLL | `locked` | `LOCK` | +| SerDes | `serialIn` | `RXD` | +| SerDes | `serialOut` | `TXD` | +| SerDes | `txReady` | `TX_RDY` | +| SerDes | `rxValid` | `RX_VLD` | +| I/O | `padIn` | `PAD` | +| I/O | `padOut` | `A` | +| I/O | `padOutputEnable` | `EN` | + +## Analog Block Wrappers + +The digital tile implementations (ClockTile, SerDesTile, IOTile) are +used for simulation and bitstream tooling. For tapeout, they are replaced +by analog wrappers that instantiate the PDK-provided symbols: + +- **AnalogPll**: replaces ClockTile with a PDK PLL macro +- **AnalogSerdes**: replaces SerDesTile with a PDK transceiver macro +- **AnalogIoCell**: replaces IOTile with a PDK I/O pad cell + +Each wrapper queries the active `PdkProvider` for the symbol path and +pin mapping, then generates xschem schematic output (`.sch` format) with +the correct PDK instances and wiring. + +## Xschem Generation + +The IP generator produces two forms of xschem output for the +mixed-signal top level: + +- **TCL script** (`-xschem.tcl`): for programmatic schematic + construction via `xschem --tcl` +- **Schematic file** (`-xschem.sch`): static xschem schematic + that can be opened directly + +Both place the digital FPGA module at the center with analog blocks +arranged around it: PLLs to the left, SerDes to the right, and I/O cells +around the perimeter matching the fabric edge mapping. + +## Tapeout Pipeline + +The tapeout flow is a five-stage RTL-to-GDS pipeline, driven entirely +through Nix: + +``` +nix build .#terra-1-tapeout +``` + +### Stage 1: Synthesis (Yosys) + +Reads the generated SystemVerilog and maps it to PDK standard cells +using the liberty timing library. Outputs a gate-level netlist. + +### Stage 2: Constraints (SDC) + +Generates timing constraints with a configurable clock period (e.g., +`clockPeriodNs = 20` for 50 MHz). + +### Stage 3: Place and Route (OpenROAD) + +Performs floorplanning, power grid generation, cell placement, clock +tree synthesis, and detailed routing using PDK tech LEF and cell LEFs. +Outputs a placed-and-routed DEF and timing/power reports. + +### Stage 4: GDS Merge (KLayout) + +Reads the routed DEF and merges it with the PDK cell GDS library to +produce the final GDS2 file for fab submission. + +### Stage 5: Layout Visualization (KLayout) + +Renders the GDS to a PNG image for visual inspection. + +### Output Artifacts + +``` +result/ + terra_1_synth.v # Gate-level netlist + terra_1_final.def # Placed and routed layout + terra_1.gds # GDS2 for fab submission + terra_1_layout.png # Layout render + timing.rpt # Timing analysis + power.rpt # Power report +``` + +## Adding a New PDK + +To add support for a new foundry PDK: + +1. Create a Nix package under `pkgs/` that builds the PDK's standard + cell library, LEFs, GDS, and liberty files. +2. Implement a `PdkProvider` subclass that maps Aegis pin names to the + new PDK's symbol pins. +3. Register the provider in `PdkProvider.registry`. +4. The tapeout pipeline will work unchanged, since it reads PDK paths + and cell library names from the Nix package's passthru attributes. diff --git a/docs/arch/routing.md b/docs/arch/routing.md new file mode 100644 index 0000000..9672d3c --- /dev/null +++ b/docs/arch/routing.md @@ -0,0 +1,110 @@ +# Routing Architecture + +Each tile in the Aegis fabric contains a routing switchbox alongside its +logic element (CLB, BRAM, or DSP). The switchbox connects the tile to its +four neighbors and allows signals to pass through, turn corners, or enter +and exit the logic element. + +## Tile Ports + +Every tile has four directional ports (north, east, south, west), each +carrying `T` tracks (where `T` is the fabric's track parameter). Signals +flow between adjacent tiles: the north output of tile `(x, y)` connects +to the south input of tile `(x, y-1)`, and so on for each direction. + +```mermaid +graph LR + NI["North In"] --> Tile["Tile (CLB + Switchbox)"] + WI["West In"] --> Tile + SI["South In"] --> Tile + EI["East In"] --> Tile + Tile --> NO["North Out"] + Tile --> EO["East Out"] + Tile --> SO["South Out"] + Tile --> WO["West Out"] +``` + +## Input Multiplexers + +Each CLB has four inputs (in0 through in3). Each input is driven by its +own multiplexer that selects from: + +- Track 0 through T-1 of each direction (4 * T sources) +- The CLB's own output (feedback) +- Constant 0 +- Constant 1 + +The select width per input is `ceil(log2(4*T + 3))` bits. + +For example, with `T = 1` (4 directional tracks + CLB output + 0 + 1 = 7 +sources), each input mux needs 3 select bits. + +| Select Value | Source | +|--------------|--------------| +| 0 to T-1 | North tracks | +| T to 2T-1 | East tracks | +| 2T to 3T-1 | South tracks | +| 3T to 4T-1 | West tracks | +| 4T | CLB output | +| 4T+1 | Constant 0 | +| 4T+2 | Constant 1 | + +## Output Multiplexers + +Each direction has `T` output tracks, and each track has its own output +multiplexer. The output mux selects from: + +- The corresponding track from each of the four directions (pass-through + or turn) +- The CLB output + +Each output track uses 4 config bits: + +| Bit | Function | +|---------|-----------------------------------------------| +| `[0]` | Enable (0 = drive zero, 1 = drive selected) | +| `[3:1]` | Source select (0=N, 1=E, 2=S, 3=W, 4=CLB out) | + +This per-track output mux design (as opposed to a shared mux per +direction) follows the approach used in architectures like the iCE40 and +ECP5, allowing independent routing of each track. + +## Tile Configuration Layout + +The full tile config concatenates the CLB config, input mux selects, and +output routing: + +| Section | Bits | Width (T=1) | +|-----------------|------------------------|-------------| +| CLB config | `[17:0]` | 18 | +| Input mux 0 | `[17 + selW : 18]` | 3 | +| Input mux 1 | next selW bits | 3 | +| Input mux 2 | next selW bits | 3 | +| Input mux 3 | next selW bits | 3 | +| Output routing | 4 dirs * T tracks * 4b | 16 | +| **Total** | | **46** | + +For multi-track configurations, the formula is: + +``` +tileConfigWidth(T) = 18 + 4 * ceil(log2(4*T + 3)) + 4 * T * 4 +``` + +## Inter-Tile Connectivity + +### Horizontal + +Tiles at `(x, y)` and `(x+1, y)` connect east-to-west. The east output +of `(x, y)` feeds the west input of `(x+1, y)`, and vice versa. + +### Vertical + +Tiles at `(x, y)` and `(x, y+1)` connect north-to-south. The north +output of `(x, y)` feeds the south input of `(x, y-1)`. + +### Edge Aggregation + +At the fabric boundary, tile outputs along each edge are combined with +wired-OR. Any tile on the north row can drive the north edge output, and +so on for each edge. This allows edge tiles to communicate with the I/O +perimeter. diff --git a/docs/arch/serdes.md b/docs/arch/serdes.md new file mode 100644 index 0000000..4cdb6f5 --- /dev/null +++ b/docs/arch/serdes.md @@ -0,0 +1,73 @@ +# SerDes Tile + +SerDes tiles provide protocol-agnostic serial transceivers on the fabric +perimeter. Each tile has a transmit (TX) and receive (RX) path with +configurable word length, bit order, clock rate, and sampling mode. The +design is intentionally generic: protocols like UART, SPI, or custom +serial links are defined entirely by configuration. + +## External Pins + +- `serialIn`: 1-bit input (RX data from off-chip) +- `serialOut`: 1-bit output (TX data to off-chip) + +## TX Path + +The transmitter loads a data word from the fabric and shifts it out one +bit at a time at the configured baud rate. + +1. Data is loaded from a configurable fabric track into a 256-bit shift + register when a strobe signal is asserted. +2. On each baud tick, the register shifts and the next bit appears on + `serialOut`. +3. Bit order is selectable: MSB-first shifts from the top of the + register, LSB-first shifts from the bottom. +4. When idle, the output level is configurable (high or low). +5. `txReady` signals that the transmitter can accept a new word. + +## RX Path + +The receiver samples `serialIn` at the baud rate and assembles incoming +bits into a word. + +1. Each baud tick, the sampled bit is shifted into a 256-bit receive + register. +2. A counter tracks how many bits have been received. When it reaches the + configured word length, `rxValid` pulses to indicate a complete frame. +3. The received data is driven onto a configurable fabric track, with the + valid bit on the adjacent track. + +## Baud Rate Generator + +An 8-bit clock divider generates the baud tick by dividing the fabric +clock by `(clockDivider + 1)`, giving a range of 1x to 256x division. + +In **DDR mode**, the baud tick fires both at the counter rollover and at +the halfway point, doubling the effective sample rate. + +**Clock polarity** inverts the sample timing when set. + +## Loopback + +A loopback mode connects the TX output back to the RX input for +self-testing without external connections. + +## Configuration + +| Bits | Field | +|------------|------------------------------------------| +| `[0]` | TX enable | +| `[1]` | RX enable | +| `[4:2]` | TX data track select | +| `[7:5]` | RX data track select | +| `[15:8]` | Word length - 1 (range 1 to 256) | +| `[16]` | Bit order (0 = LSB-first, 1 = MSB-first) | +| `[17]` | TX idle level (0 = low, 1 = high) | +| `[18]` | Clock mode (0 = SDR, 1 = DDR) | +| `[19]` | Clock polarity (0 = rising, 1 = falling) | +| `[27:20]` | Clock divider - 1 | +| `[28]` | Loopback enable | +| `[30:29]` | TX strobe track select | +| `[31]` | Reserved | + +**Total: 32 bits**