PageRank algorithm implementation in C++ exploiting CUDA to access NVIDIA's GPUs parallel computing capabilities.
- CUDA-enabled GPU device with compute capabilities 3.5 or higher
- CUDA toolkit 9.x
- Python3 and tqdm library
- g++ 6.x
In the repository are provided three bash scripts that should simplify the execution of the whole procedure. Otherwise, if you prefer a step by step custom execution, you can follow the four steps below.
-
BaseBash.sh is a base bash file that allows you to automate all the phases, by specifiyng the input parameters.
sh BashBase.sh -n testName -v vertexPath -e edgePath -t thresholdValue -d dampingFactor [-a test1Path] [-b test2Path] [-c cpuCommand] [-g gpuCommand]-n testNamename of the test. It will be used to create final and intermediate.csvfiles.-v vertexPathvertices file path.-e edgePathedges file path.-t thresholdValuespecify custom precision error threshold.-d dampingFactorspecify custom damping factor value.-a test1Pathtrue PageRank file path to be compared with the computed one.-b test2Pathvertices file path to be compared with the computed one.-c cpuCommandoptional cpu command.-g gpuCommandoptional gpu command.
-
BashSmall is a bash script that simplifies the execution. It relies on the assumption that in the execution folder of
sh BashSmall.sh [-l]is present a subfolderpagerank_contest_edgelistscontaininggraph_small_e.edgelistandgraph_small_v.edgelistfiles, that represent the.csvfiles for edges and vertices respectively. The bash file also uses two files,small_directed_truth_stringandsmall_undirected_truth_string, located inpagerank_truth_valuesfolder, which are the "truth" value of the pageRank against which they will be compared.-lis a flag that, if present, specifies that the execution of the procedure will be local and not in a remote cluster (which has to support slurm computing facility). Otherwise, if this is not present, thecpuCommanddefault issrun -w slurm-cuda-masterand thegpuCommand's issrun -N1 --gres=gpu:1
-
BashFull as for BashSmall.sh, but in this case it uses the full dataset with the correlated files.
- Download dataset with
sh download_edgelists.sh, the script will create a"pagerank_contest_edgelists"subdirectory in the current directory. - Elaborate dataset with
python3 ElaborateDataset.py [-v vertexPath -e edgePath|-s|-f] [-o]-v vertexPathvertices file path.-e edgePathedges file path.-suse default"graph_small_e.edgelist"and files"graph_small_v.edgelist"in folder extracted at step 1 as input and save in the current directory adata_small.csvfile with the processed dataset.-fuse default"graph_full_e.edgelist"and files"graph_full_v.edgelist"in folder extracted at step 1 as input and save in the current directory adata_full.csvfile with the processed dataset.-o outputPathspecify custom target file for dataset output.
- Compile the sources using
nvcc -arch=sm_35 -rdc=true -lcudadevrt main.cu handleDataset.cpp -o pagerank -use_fast_math -std=c++11. If your GPU does not support compute capabilities 3.5 or higher, the compilation will fail. This is required in order to exploit relocatable device code. - Run the algorithm computation using
./pagerank [-i inputPath |-s|-f] [-o] [-d] [-t]-i inputPathinput CSV dataset file.-suses as input"data_small.csv"and"pk_data_small.csv"as output.-fuses as input"data_full.csv"and"pk_data_full.csv"as output.[-o outputPath]specify custom target file for results output.-dspecify custom damping value. Defaults to 0.85-tspecify custom precision error threshold. Defaults to 10e-6
- Elaborate Result with
python3 GenerateResult.py [-v vertexPath -o pageRankPath -p pageRankPath|-s|-f] [-o]-v vertexPathvertices file path.-suse default dataset name"pk_data_small.csv"and files"graph_small_v.edgelist"and save in the current directory aresult_data_small.csvfile containing the processed pagerank result associated with the vertex name.-fuse default dataset name"pk_data_full.csv"and files"graph_full_v.edgelist"and save in the current directory aresult_data_full.csvfile containing the processed pagerank result associated with the vertex name.-p pageRankPathpageRank file path computed at previous phase.-o outputPathspecify custom target file for dataset output.
- Download Truth Value of pageRank with
sh download_truth_values.sh, the script will create a"pagerank_truth_values"subdirectory in the current directory with "truth" pageRank values. - Check Result using the command `c++ checker.cpp -o checker' generate the binary to check the result.
- Run the code using
./checker -c checkerPath -t truthPath [-s]-c checkerPathpath of the file to be checked.-t truthPathpath of the truth file.-sto indicate that the indices of the two files are strings.
The computation time is calculated from the first call of the kernel to the completion of the copy of PageRank's final values from GPU memory to main memory. Damping factor selected for the tests is 0.85. Precision threshold is 0.000001.
- DataSet "Small": 22 iterations, time to convergence: 0.045 s.
- DataSet "Full": 34 iterations, time to convergence: 6.402 s.
- DataSet "Small": 22 iterations, time to convergence: 0.034 s.
- DataSet "Full": 34 iterations, time to convergence: 3.666 s.