SPARTA

SPARse acceleration on Tensor Architecture

The project aims to investigate new data structures and compression algorithms for exploiting new architecture capabilities, specifically designed for deep learning, to accelerate sparse and irregular applications, such as graph analytics and arbitrary sparse DNN and GCN. SPARTA also looks at productivity and performance portability across different AI accelerators by providing an abstraction layer.

The repository contains stable code for reordering and compressing sparse matrices into dense block data-structures. The reordering algorithm matches rows (or columns) with similar patterns and builds dense blocks. The similarity of patterns is first determined with a hash function, and then refined with a tunable algorithm, which matches patterns with high cosine similarity.

The repository also contains code for sparse-dense matrix-matrix multiplication that exploits the dense block data-structure.

Input sparse matrices are stored in Compressed Sparse Row (CSR) or Compressed sparse columns (CSC) format. A variant of the variable Block Compressed Sparse Rows (or Columns) is used to store block-sparse matrices.

SPARTA requires CUDA >=10.0

STRUCTURE

The files have the following structure

SPARTA

include
obj
programs
src
test

each folder contains

general: files needed by all versions
cuda: files needed by the cuda version
mkl: files needed by the mkl version

USAGE

Compile with

'''make serial''' to compile without cuda
or '''make all''' to compile also the cuda test

run '''./programs/general/TEST_blocking_VBR''' to see an example of blocking;

For example, run ./programs/general/TEST_blocking_VBR -b 3 -t 0.6 to produce a blocking of a test matrix, fixing the column size at 3 (-b 3) and the threshold distance tau at 0.6 (-t 0.6).

run again with ./programs/general/TEST_blocking_VBR -b 3 -t 0.6 -F 1 -B 3 to force fixed-height blocks (-F 1) of height 3 (-B 3)

add the option -f PATH/TO/MATRIX.el to load a matrix. some small matrices are available for testing in data/ you can use your own matrices, provided they are stored as an edgelist with space-separated, ordered values.

Find all the options below:

OPTIONS:

-a: blocking algorithm selection: 0: iterative, 1: iterative_structured, 2: fixed_size 3: iterative_clocked 4: iterative_queue 5: iterative_max_size (BEST fixed block)

-b: column block size

-B: row block size (only for fixed-size blockings)

-c: number of columns in the matrix B (only used when running AB multiplication)

-f: filename of an edgelist to be read from memory

-F: force fixed size: 0: false. The blocking algorithm may creat blocks of uneven height 1: true. Whatever is the result of the blocking algorithm, a fixed-size grid (see -b, -B) will be superimposed to the result.

-g: use group sized when calculating similarity. 0: false 1: true

-o: filename where to save the results of blocking and multiplication

-p: usage of "pattern" when calculating similarities: 0: do not use pattern. similarities are calculated between a candidate row and the seed row. 1: use patterns. similarities are calculated between a candidate and the entire cluster

-P: treat the matrix as weighted or not 0: weights are ignored when reading a matrix from edgelist and during processing 1: weights are loaded, stored, and processed

-m: similarity measure: 0: Hamming 1: Jaccard (default)

-M: spmm multiplication algorithm. Blocking must be appropriate to the chosen algorithm. 0: no multiplication 1: cuBLAS GEMM (blocking is ignored) 2: cuSparse CSR (blocking is ignored) 3: cuSparse BELLPACK (blocks should be fixed-size and square) 4: cuBLAS VBR (any blocking allowed);

-n: name of the experiment

-r: reorder the CSR matrix before processing/blocking/multiplying 0: do nothing (default) 1: reorder rows by nonzero count (descending) 2: scramble rows

-R: matrix format (how each line in the edgelist looks like) 0: row col (default) 1: col row

-s: random seed

-S: number of cuda streams to be used in the VBR multiplication 16 (default)

-t: the distance threshold for merging similar rows: 0.0: merge only identical rows 0.x: only merge when distance < 0.x 1.0: merge any nonzero row

-v: verbose 0: print minimum 1: print infos, but not matrices 2: print matrices

-w: warmup repetitions

-x: multiplication repetitions

Name		Name	Last commit message	Last commit date
Latest commit History 2,401 Commits
batch		batch
data		data
images		images
include		include
results		results
src		src
test		test
utils		utils
.gitignore		.gitignore
README.md		README.md
makefile		makefile
makefile.CAROLA2		makefile.CAROLA2
makefile.DEEPLEARNING		makefile.DEEPLEARNING
makefile.KELVIN		makefile.KELVIN
makefile.MARZOLA		makefile.MARZOLA
postcompute_results.sh		postcompute_results.sh
rmats_blocking.csv		rmats_blocking.csv
rmats_blocking_19_04.csv		rmats_blocking_19_04.csv
rmtas_multiplication.csv		rmtas_multiplication.csv
suitsparse_all_marzola.csv		suitsparse_all_marzola.csv
suitsparse_all_marzola_multiply.csv		suitsparse_all_marzola_multiply.csv
suitsparse_all_marzola_multiply_half.csv		suitsparse_all_marzola_multiply_half.csv
tau_marzola.csv		tau_marzola.csv
tau_rmats.csv		tau_rmats.csv
tmp_results.txt		tmp_results.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPARTA

STRUCTURE

USAGE

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SPARTA

STRUCTURE

USAGE

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages