Skip to content

HicrestLaboratory/SPARTA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2,401 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPARTA

SPARse acceleration on Tensor Architecture

The project aims to investigate new data structures and compression algorithms for exploiting new architecture capabilities, specifically designed for deep learning, to accelerate sparse and irregular applications, such as graph analytics and arbitrary sparse DNN and GCN. SPARTA also looks at productivity and performance portability across different AI accelerators by providing an abstraction layer.

The repository contains stable code for reordering and compressing sparse matrices into dense block data-structures. The reordering algorithm matches rows (or columns) with similar patterns and builds dense blocks. The similarity of patterns is first determined with a hash function, and then refined with a tunable algorithm, which matches patterns with high cosine similarity.

The repository also contains code for sparse-dense matrix-matrix multiplication that exploits the dense block data-structure.

Input sparse matrices are stored in Compressed Sparse Row (CSR) or Compressed sparse columns (CSC) format. A variant of the variable Block Compressed Sparse Rows (or Columns) is used to store block-sparse matrices.

SPARTA requires CUDA >=10.0

STRUCTURE

The files have the following structure

SPARTA

  • include
  • obj
  • programs
  • src
  • test

each folder contains

  • general: files needed by all versions
  • cuda: files needed by the cuda version
  • mkl: files needed by the mkl version

USAGE

Compile with

'''make serial''' to compile without cuda
or '''make all''' to compile also the cuda test

run '''./programs/general/TEST_blocking_VBR''' to see an example of blocking;

For example, run ./programs/general/TEST_blocking_VBR -b 3 -t 0.6 to produce a blocking of a test matrix, fixing the column size at 3 (-b 3) and the threshold distance tau at 0.6 (-t 0.6).

run again with ./programs/general/TEST_blocking_VBR -b 3 -t 0.6 -F 1 -B 3 to force fixed-height blocks (-F 1) of height 3 (-B 3)

add the option -f PATH/TO/MATRIX.el to load a matrix. some small matrices are available for testing in data/ you can use your own matrices, provided they are stored as an edgelist with space-separated, ordered values.

Find all the options below:

OPTIONS:

-a: blocking algorithm selection: 0: iterative, 1: iterative_structured, 2: fixed_size 3: iterative_clocked 4: iterative_queue 5: iterative_max_size (BEST fixed block)

-b: column block size

-B: row block size (only for fixed-size blockings)

-c: number of columns in the matrix B (only used when running AB multiplication)

-f: filename of an edgelist to be read from memory

-F: force fixed size: 0: false. The blocking algorithm may creat blocks of uneven height 1: true. Whatever is the result of the blocking algorithm, a fixed-size grid (see -b, -B) will be superimposed to the result.

-g: use group sized when calculating similarity. 0: false 1: true

-o: filename where to save the results of blocking and multiplication

-p: usage of "pattern" when calculating similarities: 0: do not use pattern. similarities are calculated between a candidate row and the seed row. 1: use patterns. similarities are calculated between a candidate and the entire cluster

-P: treat the matrix as weighted or not 0: weights are ignored when reading a matrix from edgelist and during processing 1: weights are loaded, stored, and processed

-m: similarity measure: 0: Hamming 1: Jaccard (default)

-M: spmm multiplication algorithm. Blocking must be appropriate to the chosen algorithm. 0: no multiplication 1: cuBLAS GEMM (blocking is ignored) 2: cuSparse CSR (blocking is ignored) 3: cuSparse BELLPACK (blocks should be fixed-size and square) 4: cuBLAS VBR (any blocking allowed);

-n: name of the experiment

-r: reorder the CSR matrix before processing/blocking/multiplying 0: do nothing (default) 1: reorder rows by nonzero count (descending) 2: scramble rows

-R: matrix format (how each line in the edgelist looks like) 0: row col (default) 1: col row

-s: random seed

-S: number of cuda streams to be used in the VBR multiplication 16 (default)

-t: the distance threshold for merging similar rows: 0.0: merge only identical rows 0.x: only merge when distance < 0.x 1.0: merge any nonzero row

-v: verbose 0: print minimum 1: print infos, but not matrices 2: print matrices

-w: warmup repetitions

-x: multiplication repetitions

Releases

No releases published

Packages

 
 
 

Contributors