Skip to content

SuperScientificSoftwareLaboratory/Uni-STC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

This artifact appendix describes the experimental workflow to reproduce the results presented in the paper "Uni-STC: Unified Sparse Tensor Core" (Paper #313). We provide a containerized environment (Docker) pre-installed with the simulators, scripts, and small-scale datasets. The experiments are categorized into two levels: Fast Verification (approx. 5 hours) for functional validation and Complete Verification (approx. 75 hours) for full reproduction.

Artifact check-list (meta-information)

  • Program: Python 3.9, Bash Scripts, C++ Simulators.
  • Compilation: GCC 9+, OpenMP 4.5+, OpenCV 4.x.
  • Data set: SuiteSparse Matrix Collection (2,800+ matrices) and DLMC.
  • Run-time environment: Ubuntu 22.04 LTS (via Docker).
  • Hardware: X86-64 CPU, ≥64 GB DRAM.
  • Storage: ≥150 GB (Fast Mode) / ≥500 GB (Complete Mode).
  • Experiments: Format overhead analysis, Performance comparison, AMG solver, and Energy Efficiency Density.
  • Prepare workflow time: 3 hours to download a 40GB Image.
  • Execution time: Fast mode: 5 hours; complete mode: 75 hours.
  • Publicly available: Yes.
  • Workflow automation framework used: Yes.

Description

How to access

We provide a persistent artifact package hosted on Google Drive, which includes:

  1. Docker Image (HPCA-Pap313-AE.tar): Contains the OS, dependencies, small data set, simulators, and plotting scripts.

  2. Full Dataset (matrix.7z): The complete SuiteSparse collection required for complete verification.

Hardware dependencies

To fully reproduce the results reported in the paper, we recommend the following hardware configuration:

  • Processor: X86-64 CPU with at least 16 cores.
  • Memory: Minimum 64 GB DRAM is required to load large matrices in the complete dataset.
  • Disk: 100 GB for the docker image and fast verification. 600 GB for the full dataset decompression.

Software dependencies

The artifact is encapsulated in a Docker container to ensure environment consistency. The host machine requires:

  • OS: Linux (Ubuntu 20.04/22.04 recommended).
  • Docker Engine: Version ≥ 20.10.

Inside the container, the environment is pre-configured with:

  • Compilers: GCC 11.4, CMake 3.22.
  • Python Env: Python 3.10 with necessary libraries.
  • OpenCV: Version 4.x for image processing.

1. Download and Decompress.

Download HPCA-Pap313-AE.tar from the link.

2. Load and Start Container.

Load the image into your local Docker registry and launch the container in the background. Note that if you encounter permission errors, please prepend sudo.

$ docker load < HPCA-Pap313-AE.tar

# Optional: remove the tar file to save space
$ rm HPCA-Pap313-AE.tar 
$ docker run -itd --name HPCA-Pap313 hpca-pap313-ae:v2

Initialization

Access the container, upgrade python package and execute the initialization script. This script compiles the simulator binaries and checks library integrity.

$ docker exec -it HPCA-Pap313 /bin/bash
(container)$ cd /root
# upgrade package and compile
(container)$ pip3 install pip setuptools wheel -U
(container)$ pip3 install quickstart-rhy -U
(container)$ ./init.sh

Expected Output:

The initialization is successful if the following logs appear:

[INFO] Compile ResNet50 (sparse) Succeeded!
[INFO] Compile ResNet50 (dense) Succeeded!
[INFO] Compile Simulator (Scheduler = 8) Succeeded!

Experiment workflow

We provide a unified automation tool qrun to manage experiments. All commands should be executed in the /root/Sim directory:

(container)$ cd /root/Sim

Note on Pre-computed Results. To enable rapid inspection, we have pre-packaged execution logs and generated figures. This allows the subsequent verification instructions to complete in under 10 minutes.

If you prefer to execute the full simulation from scratch to verify the functional reproduction, please clean the pre-existing data using the following commands:

# remove figures and execution logs
(container)$ rm /root/Sim/fig/*
(container)$ cd /root/Sim/dist && rm transformer*.csv spmv/*.csv spmm/*.csv spmspv/*.csv spgemm/*.csv ai/*
(container)$ cd /root/Sim && rm resnet50/dense/*.csv reset50/sparse/*.csv

Part 1: Fast Verification (L1)

Estimated Time: ~5 hours | Storage: No extra storage required.

This mode uses small-scale datasets included in the image to reproduce key figures (Fig. 15–19, 21).

  • Task 1.1: Format Overhead (Fig. 15)

    (container)$ qrun format
    • Explanation: Evaluates the storage compression ratio of the BBC format across varying sparsity levels.
  • Task 1.2: Hardware Comparison (Fig. 17, 18, 19)

    (container)$ qrun run-sample
    • Explanation: Runs SpMV, SpMSpV, SpMM and SpGEMM kernels on representative matrices. Measures performance and energy.
  • Task 1.3: Random SpGEMM Evaluation (Fig. 16)

    (container)$ qrun spgemm2
  • Task 1.4: AMG Application (Fig. 21)

    (container)$ qrun run-amg

Part 2: Complete Verification (L2)

Estimated Time: ~75 hours | Storage: ~500GB required.

This mode downloads the full SuiteSparse collection to reproduce the remaining distribution figures (Figures 20 and 22).

Step 1: Mount Dataset.

Download matrix.7z on your host machine, copy it to the container, and extract it.

# On Host Machine
$ docker cp matrix.7z HPCA-Pap313:/root

# On Container
(container)$ cd /root
(container)$ 7zz x matrix.7z
(container)$ mv matrix/* /matrix

Step 2: Execution.

  • Task 2.1: Full Dataset Distribution (Fig. 20)

    (container)$ qrun run-all # Takes ~24 hours
  • Task 2.2: Energy Efficiency Density (Fig. 22)

    (container)$ qrun eed   # Takes ~48 hours

Evaluation and expected results

Upon completion of the experiments, all generated charts are stored in the container directory /root/Sim/fig/. We provide two methods to inspect these results.

Result Inspection

Option 1: Export to Host (Recommended)

For the best viewing experience and to facilitate comparison with the paper, we recommend copying all generated figures to host. Execute the following command on your host terminal:

$ docker cp HPCA-Pap313:/root/Sim/fig ./uni-stc-results

Explanation: This will create a folder named uni-stc-results in your current directory containing all generated .png files.

Option 2: In-Terminal Preview

For users employing modern terminal emulators capable of image rendering (e.g., Kitty, iTerm2, or Ghostty), you can preview results directly inside the container without exporting.

# Inside the container
(container)$ qs icat /root/Sim/fig/15.png

Detailed Analysis

We outline the specific observations required to validate the artifacts below. Note: The simulator provided in this artifact is a lightweight version extracted from Accel-Sim to facilitate rapid verification. As it excludes power modeling for register I/O, the observed energy savings for Uni-STC may be higher than the conservative figures reported in the paper.

  • Fig. 15 (Format Overhead): Verify that the BBC format space-reduction (y-axis) increases as the density (x-axis) increases.
  • Fig. 16 (Random SpGEMM Performance): Uni-STC should demonstrate performance that is equal to or greater than other baseline hardwares.
  • Fig. 17 & 20 (Overall Performance & Efficiency):
    • Fig. 17 (Representative): Confirm that Uni-STC achieves the highest values in speedup, energy reduction, and area efficiency.
    • Fig. 20 (Full Dataset): Confirm that these performance gains are consistent across the full SuiteSparse collection (2,800+ matrices).
  • Fig. 18 (Energy Breakdown): Verify that Uni-STC achieves the lowest total energy consumption. Observe that the energy consumption is balanced across the three internal operations (Fetch, Schedule, Compute), showing similar values.
  • Fig. 19 (Traffic & Network Scale): Verify that Uni-STC incurs the lowest data traffic compared to other architectures. Confirm that Uni-STC supports the required enabled network scale as depicted in the figure.
  • Fig. 21 (AMG Solver): Uni-STC should exhibit a higher speedup ratio compared to other baseline hardwares.
  • Fig. 22 (Scalability - EED): Compare the Energy Efficiency Density (EED) between Uni-STC(8) and Uni-STC(4): For SpMV / SpMSpV, Uni-STC(8) is slightly lower than Uni-STC(4). For SpMM / SpGEMM, Uni-STC(8) is higher than Uni-STC(4).

About

Artifact Evaluation of Uni-STC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors