Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions docs/arch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Aegis Architecture Overview

Aegis is a parameterized FPGA fabric generator written in Dart using the
ROHD hardware description framework. It outputs synthesizable SystemVerilog
for the entire FPGA, from logic tiles to I/O pads to the configuration
chain. This document describes the silicon architecture: how the fabric is
structured, how tiles connect, and how a bitstream programs the device.

## Device Hierarchy

An Aegis device is organized as a layered hierarchy:

```mermaid
graph TD
FPGA[AegisFPGA]
FPGA --> Loader[FabricConfigLoader]
FPGA --> Clock["ClockTile[0..N]"]
FPGA --> IO[IOFabric]
IO --> IOTile["IOTile[0..P]"]
IO --> SerDes["SerDesTile[0..S]"]
FPGA --> Fabric[LutFabric]
Fabric --> Tiles["Tile[x][y] (LUT, BRAM, or DSP)"]
```

The `LutFabric` is a rectangular grid of tiles. Each tile contains a
configurable logic block (CLB) and a routing switchbox. Specialized columns
replace standard LUT tiles with BRAM or DSP tiles at regular intervals.

The `IOFabric` wraps the grid perimeter with I/O pads and SerDes
transceivers. Clock tiles sit outside the fabric and distribute divided
clocks to all tiles.

## Fabric Grid Layout

The grid is `width x height` tiles. Columns are specialized based on
their index:

- **BRAM columns**: placed at every `bramColumnInterval` columns
- **DSP columns**: placed at every `dspColumnInterval` columns, skipping
BRAM positions
- **LUT columns**: all remaining columns

For the Terra 1 device (48x64, bramColumnInterval=16, dspColumnInterval=24):

| Column Type | Count | Tiles per Column | Total Tiles |
|-------------|-------|------------------|-------------|
| LUT | 45 | 64 | 2,880 |
| BRAM | 2 | 64 | 128 |
| DSP | 1 | 64 | 64 |

## Carry Chains

Each column has a vertical carry chain running south to north. The bottom
tile in each column receives `carryIn = 0`, and each tile's `carryOut`
feeds the `carryIn` of the tile above it. This enables fast arithmetic
(adders, counters) without routing through the switchbox.

BRAM tiles pass the carry signal through unchanged.

## Edge I/O

The fabric's four edges aggregate tile outputs using wired-OR. Any tile
on an edge can drive the corresponding external output. I/O pads on the
perimeter connect to these edge signals:

- North edge: `width` pads (left to right)
- East edge: `height` pads (top to bottom)
- South edge: `width` pads (left to right)
- West edge: `height` pads (top to bottom)

Total pads = `2 * width + 2 * height` (224 for Terra 1).

## Configuration

The entire device is programmed through a single serial shift register
chain. Bits are shifted in through the clock tiles, then through I/O
tiles, then SerDes tiles, and finally through the fabric tiles in
row-major order. A `cfgLoad` pulse transfers the shift register contents
to the active configuration registers in parallel.

See [Configuration Chain](configuration.md) for the full protocol.

## Tile Documentation

- [CLB (Configurable Logic Block)](clb.md)
- [Routing](routing.md)
- [BRAM (Block RAM)](bram.md)
- [DSP (Digital Signal Processing)](dsp.md)
- [I/O Pad](io.md)
- [SerDes](serdes.md)
- [Clock Tile](clock.md)
- [Configuration Chain](configuration.md)

## Other Documentation

- [PDK Integration and Tapeout](pdk.md)
57 changes: 57 additions & 0 deletions docs/arch/bram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Block RAM (BRAM) Tile

BRAM tiles provide on-chip memory distributed across the fabric in
dedicated columns. Each BRAM tile implements a dual-port synchronous RAM
that can be read and written independently from two directions.

## Parameters

| Parameter | Default | Description |
|--------------|---------|-------------------------------|
| Data width | 8 bits | Width of each memory word |
| Address width| 7 bits | Address bus width |
| Depth | 128 | Number of words (2^addrWidth) |

## Ports

The two ports are mapped to the tile's directional routing:

- **Port A**: input from the north, output to the south
- **Port B**: input from the west, output to the east

Data, address, and write-enable signals are packed into the routing
tracks. The packing adapts to the available track width:

```
[effAddrWidth-1 : 0] Address bits
[effAddrWidth+effDataWidth-1 : effAddrWidth] Data bits
[effAddrWidth+effDataWidth] Write-enable (if tracks allow)
```

If the track width is narrower than the full address + data width, the
signals are truncated and zero-extended.

## Read/Write Behavior

**Writes** are synchronous. On the rising clock edge, if the port is
enabled and write-enable is asserted, the data word is stored at the
given address. Both ports can write simultaneously (true dual-port).

**Reads** are asynchronous (combinational). When a port is enabled, the
data at the addressed location is continuously driven onto the output
tracks. When disabled, the output is zero.

## Carry Chain

BRAM tiles pass the carry signal through unchanged (`carryOut = carryIn`).
They do not consume or generate carry values.

## Configuration

| Bit | Field |
|--------|---------------|
| `[0]` | Port A enable |
| `[1]` | Port B enable |
| `[7:2]`| Reserved |

**Total: 8 bits**
89 changes: 89 additions & 0 deletions docs/arch/clb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Configurable Logic Block (CLB)

The CLB is the fundamental logic element of the Aegis fabric. Each CLB
contains a 4-input lookup table (LUT4), a D flip-flop, and a carry chain
multiplexer (MUXCY). Together these implement arbitrary 4-input
combinational logic with optional registering and fast arithmetic.

## Block Diagram

```mermaid
graph TD
in0 --> LUT4["LUT4 (16-bit truth table)"]
in1 --> LUT4
in2 --> LUT4
in3 --> LUT4
LUT4 -- "lutOut" --> OutMux["Output Mux (cfg 16)"]
LUT4 -- "lutOut" --> FF["D Flip-Flop"]
clk --> FF
FF -- "ffOut" --> OutMux
OutMux --> out
LUT4 -- "propagate (P)" --> MUXCY["MUXCY (cfg 17)"]
carryIn --> MUXCY
in0 -- "generate" --> MUXCY
MUXCY --> carryOut
```

## LUT4

The LUT4 implements any Boolean function of four inputs. It stores a
16-bit truth table in configuration bits `[15:0]`. The output is selected
by using the four inputs as an index into the truth table.

Internally, the LUT is a 4-stage multiplexer tree:

| Stage | Select | Muxes | Operation |
|-------|--------|-------|----------------------------------------|
| 0 | in0 | 8 | `s0[i] = mux(in0, cfg[2i+1], cfg[2i])` |
| 1 | in1 | 4 | `s1[i] = mux(in1, s0[2i+1], s0[2i])` |
| 2 | in2 | 2 | `s2[i] = mux(in2, s1[2i+1], s1[2i])` |
| 3 | in3 | 1 | `out = mux(in3, s2[1], s2[0])` |

The effective computation is `out = cfg[{in3, in2, in1, in0}]`.

### Common Truth Tables

| Function | Truth Table | Notes |
|--------------|-------------|------------------------------|
| 2-input AND | `0x8888` | on in0, in1 |
| 2-input OR | `0xEEEE` | on in0, in1 |
| 2-input XOR | `0x6666` | also used as carry propagate |
| NOT | `0x5555` | inverts in0 |
| Constant 0 | `0x0000` | |
| Constant 1 | `0xFFFF` | |

## D Flip-Flop

The flip-flop captures the LUT output on the rising clock edge. Config
bit `[16]` selects whether the CLB output comes from the flip-flop
(registered) or directly from the LUT (combinational).

- `cfg[16] = 0`: `out = LUT output` (combinational)
- `cfg[16] = 1`: `out = FF output` (registered)

## Carry Chain (MUXCY)

The carry chain enables fast arithmetic by bypassing the general routing
fabric. When carry mode is enabled (config bit `[17] = 1`):

- The LUT output acts as the **propagate** signal (P)
- `carryOut = P ? carryIn : in0`
- `sum = P ^ carryIn` (fast XOR for adder sum bit)

When carry mode is disabled (`cfg[17] = 0`):
- `carryOut = 0`
- The CLB output is the normal LUT/FF output

Carry chains propagate vertically through a column (south to north),
enabling multi-bit adders and counters without consuming routing
resources.

## Configuration Bit Layout

| Bits | Width | Field |
|----------|-------|-------------|
| `[15:0]` | 16 | LUT truth table |
| `[16]` | 1 | FF enable (1 = registered) |
| `[17]` | 1 | Carry mode enable |

**Total: 18 bits**
72 changes: 72 additions & 0 deletions docs/arch/clock.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Clock Tile

Clock tiles generate divided clock signals from a reference clock and
distribute them to the fabric. Each clock tile provides four independent
outputs, each with its own divider, phase offset, and duty cycle control.

## Outputs

Each of the four outputs can be independently configured:

| Feature | Range / Options |
|-------------|----------------------------------|
| Divider | 1 to 256 (8-bit, divides by N+1) |
| Phase | 0, 90, 180, or 270 degrees |
| Duty cycle | 50% toggle or single-cycle pulse |
| Enable | Per-output enable bit |

## Divider

Each output has an 8-bit counter that counts from 0 to the configured
divider value. This divides the reference clock frequency by
`(divider + 1)`, giving a range of divide-by-1 to divide-by-256.

## Phase Control

Phase offset shifts the output clock relative to the reference. The
offset is computed as a fraction of the divider period:

| Phase Select | Offset Cycles |
|--------------|---------------------------|
| `00` (0) | 0 |
| `01` (90) | divider / 4 |
| `10` (180) | divider / 2 |
| `11` (270) | divider / 2 + divider / 4 |

## Duty Cycle

In **50% duty mode** (`duty = 1`), the output toggles at the midpoint of
each period, producing a symmetric square wave.

In **pulse mode** (`duty = 0`), the output pulses high for one reference
clock cycle at the phase offset point and remains low otherwise.

## Lock Indicator

The `locked` output is asserted when all enabled clock outputs have
completed at least one full division cycle. This can be used for
synchronization or to gate downstream logic until clocks are stable.

## Configuration

| Bits | Field |
|-------------|---------------|
| `[0]` | Global enable |
| `[8:1]` | Divider 0 - 1 |
| `[16:9]` | Divider 1 - 1 |
| `[24:17]` | Divider 2 - 1 |
| `[32:25]` | Divider 3 - 1 |
| `[34:33]` | Phase 0 |
| `[36:35]` | Phase 1 |
| `[38:37]` | Phase 2 |
| `[40:39]` | Phase 3 |
| `[41]` | Enable 0 |
| `[42]` | Enable 1 |
| `[43]` | Enable 2 |
| `[44]` | Enable 3 |
| `[45]` | Duty 0 |
| `[46]` | Duty 1 |
| `[47]` | Duty 2 |
| `[48]` | Duty 3 |

**Total: 49 bits**
Loading
Loading