Skip to content

Arka236/Single_Cycle_RISCV_Processor

Repository files navigation

🚀 High-Speed AES-128 Cryptographic IP Core (AXI4-Lite)

RTL Design Verification Interface Standard

📌 Table of Contents

  1. The Problem & Project Aim
  2. Project Overview
  3. System Architecture & Modules
  4. Custom AXI4-Lite Register Map
  5. Synthesis Metrics & Results
  6. Security Testing & Verification
  7. Tools Used
  8. How to Run Simulation

1. The Problem & Project Aim

The Problem Statement: In modern integrated sensing and communication systems, data security is mandatory. However, executing complex cryptographic algorithms like AES-128 purely in software on a general-purpose processor is highly inefficient. Software execution requires thousands of clock cycles to encrypt a single 16-byte block of data. This creates a massive data bottleneck, consumes excessive CPU overhead, and drains power.

The Project Aim: The objective of this project was to offload the heavy mathematical workload of encryption into a dedicated hardware IP block. Specifically, the aim was to design, verify, and implement a FIPS-197 compliant AES-128 hardware accelerator from scratch strictly using Verilog HDL. This custom IP core had to be easily integrated into modern SoC architectures using a strict AXI4-Lite Slave Memory interface, allowing an AXI Master to write data, trigger the hardware, and read back the secured ciphertext with minimal overhead.

2. Project Overview

This project implements a fully unrolled, 10-stage pipelined AES-128 Cryptographic IP Core designed entirely in Verilog HDL. To ensure seamless integration into modern System-on-Chip (SoC) architectures, the cryptographic engine is wrapped in a custom, dual-ported AXI4-Lite Slave Memory interface.

By leveraging a pure hardware architecture, the system achieves massive mathematical throughput for standard ECB encryption in silicon. The accompanying automated Verilog AXI-Master testbench acts as the system controller, dynamically executing advanced stream cipher modes (CBC and CTR) by driving the AXI interface, proving the IP's versatility without altering the underlying FIPS-compliant pipeline.


3. System Architecture & Modules

The design is modular and hierarchical, built from the lowest mathematical functions to the highest system wrapper.

A. The Cryptographic Primitives (The Math)

These modules are the foundational building blocks of the AES algorithm.

  • sbox.v (SubBytes): A non-linear substitution step acting as a massive Look-Up Table (LUT). It takes an 8-bit input and replaces it with a specific 8-bit output based on Galois Field inverse mathematics, providing the "confusion" in the cipher.
  • shift_rows.v: Performs a simple hardware routing trick, shifting the bytes in the 4x4 data matrix by different offsets. Because it only requires rewiring (no actual logic gates), it executes in zero clock cycles.
  • mix_columns_32bit.v: The most mathematically heavy module. It takes a 32-bit column of data and multiplies it against a fixed matrix in a Galois Field (GF(2^8)). This provides the "diffusion" (Avalanche Effect), ensuring a 1-bit change in the input cascades across the entire block.
  • key_expand_stage.v: Takes the previous round's key and performs XOR and S-Box substitutions to generate the unique key for the next round on the fly.

B. The Pipeline Architecture (The Engine)

  • aes_round.v: Represents one standard AES round. Instantiates the SubBytes, ShiftRows, MixColumns, and AddRoundKey logic in sequence.
  • aes_round_last.v: The final round (Round 10) must skip the MixColumns step per the AES standard. This module ensures strict FIPS compliance.
  • aes_pipeline.v: The core engine. Instead of using an iterative state machine, this module physically unrolls the loop. It instantiates 9 standard rounds and 1 final round, wiring them together in a massive 10-stage pipeline entirely in RTL.

C. The SoC Integration (The Wrapper)

  • aes_axi_wrapper.v: The bridge between the AXI Master and the Verilog pipeline, acting as an AXI4-Lite Slave Memory map.
    • Decouples the external 32-bit SoC bus limit from the engine's 128-bit internal datapath.
    • Contains a strict Busy/Idle hardware lockout mechanism (Status Register at 0x04) that physically prevents the input data from being corrupted while the engine is busy.
    • Manages the 11-cycle latency state machine, capturing the ciphertext and raising a Done flag when finished.

4. Custom AXI4-Lite Register Map

Designed a 64-byte AXI4-Lite memory map for external control:

Offset Register Name Access Description
0x00 Control W Bit 0: Start Engine (Auto-clearing pulse)
0x04 Status R Bit 0: Busy, Bit 1: Idle, Bit 3: Done
0x10 - 0x1C Key [0:3] W 128-bit Master AES Key
0x20 - 0x2C Plaintext [0:3] W 128-bit Data Input (Locked when Busy)
0x30 - 0x3C Ciphertext [0:3] R 128-bit Encrypted Output

5. Synthesis Metrics & Results

The project was a complete success, perfectly matching the expected ciphertext of the official NIST FIPS-197 standard test vectors. The mathematical core was heavily optimized for single-cycle resolution per round, yielding exceptional static timing results:

  • Target Clock: 10.0 ns (100 MHz)
  • Achieved WNS (Worst Negative Slack): +5.8 ns
  • Maximum Operating Frequency (Fmax): 238 MHz
  • Pipeline Latency: 11 Clock Cycles

Throughput Analysis

Because the 10-stage pipeline is fully unrolled, it outputs a completed 128-bit block every single clock cycle once saturated.

  • Peak Internal Engine Throughput: 30.4 Gbps (128-bit datapath processing one block per clock cycle at 238 MHz).
  • Theoretical Interface Throughput: 7.6 Gbps (32-bit AXI4-Lite external bus limit: 32 bits × 238 MHz).
  • Real-World System Bottleneck: Because AXI4-Lite requires a multi-cycle handshaking protocol for every transaction, system-level throughput is bound by the AXI Master's transmission speed. The cryptographic engine was successfully optimized to be vastly faster than its I/O interface.

6. Security Testing & Verification

  • FIPS-197 Standard Vectors: The hardware was rigorously tested and verified against official NIST vectors.
  • The Avalanche Effect: Cryptographic diffusion was visually verified in simulation. Modifying a single bit of the input plaintext results in a complete scrambling of the 128-bit ciphertext by Round 3, proving the mathematical integrity of the SubBytes and MixColumns stages.
  • Automated RTL Control & Advanced Modes (tb_aes_axi_lite.v): A custom Verilog testbench simulates a generic SoC AXI Master. It utilizes read/write tasks to program the key and trigger the memory map, exhaustively verifying advanced streaming modes:
    • ECB (Electronic Codebook): Natively executed in hardware.
    • CBC (Cipher Block Chaining): Testbench-driven XOR chaining utilizing the RTL core as a coprocessor.
    • CTR (Counter Mode): Stream cipher implementation encrypting a Nonce+Counter, masking data patterns while retaining high parallel throughput.

7. Tools Used

  • EDA Tool: Xilinx Vivado 2018.3 (Synthesis & Behavioral Simulation)
  • Language: Verilog-2001

8. How to Run Simulation

  1. Clone this repository.
  2. Open Xilinx Vivado and create a new RTL project.
  3. Add the Verilog files to your project hierarchy.
  4. Set tb_aes_axi_lite.v as the top module for simulation.
  5. Launch Behavioral Simulation.
  6. In the TCL Console, observe the automated execution and verification of the FIPS-197 ECB baseline, followed by the CBC and CTR advanced mode tests. Expand the wave viewer to observe the AXI handshaking.

About

A 32-bit Single-Cycle RISC-V (RV32I) Processor designed from scratch in Verilog HDL. Features a custom datapath, control unit, and Harvard memory architecture, fully simulated and verified in Xilinx Vivado.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors