Skip to content

nicexlab/parasync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

Introduction

ParaSync: Exploiting Fine-Grained Parallelism for Efficient File Synchronization

Experimental Setup

Build From Source

Requirements

  • gcc == 11.4.0
  • cmake >= 3.18
  • sudo apt install git cmake autoconf pkg-config libtool libcurl4-openssl-dev libssl-dev libpopt-dev libbz2-dev libb2-dev doxygen nasm build-essential libaio-dev zlib1g-dev libext2fs-dev texinfo libevent-dev libev-dev libgflags-dev libprotobuf-dev libprotoc-dev protobuf-compiler libleveldb-dev libgoogle-perftools-dev hwloc libgtest-dev libgmock-dev libfuse-dev libgsasl7-dev

Build

git clone https://github.com/nicexlab/parasync
cd parasync
git submodule update --init --recursive
sudo ./thirdparty/deps_install.sh

The whole project is built using CMake. You can build it by running the following commands:

cd src/skysync-f && protoc -I=. --cpp_out=. skysync.proto && cd ../..
cd src/dsync && protoc -I=. --cpp_out=. dsync.proto && cd ../..
cd src/parasync && protoc -I=. --cpp_out=. parasync.proto && cd ../..

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j$(nproc)

Upon successful compilation, all executables will be located in the build/ directory.

Run

Local Evaluation (Single-Machine)

First, you can run the core logic of each algorithm on a single machine using the provided test executables including rsync_test, dsync_test, skysync_f_test and skysync_c_test. These tests measure the performance without network overhead.

# Arg 1: Path to the old/basis file
# Arg 2: Path to the new file
# Arg 3: 0 for software-only, 1 for hardware acceleration
./rsync_test <old-file: 100MB> <new-file: 100MB-insert-8MB> <0 for software, 1 for hardware acceleration>
./dsync_test <old-file: 100MB> <new-file: 100MB-insert-8MB> <0 for software, 1 for hardware acceleration>
./skysync_f_test <old-file: 100MB> <new-file: 100MB-insert-8MB> <0 for software, 1 for hardware acceleration>
./skysync_c_test <old-file: 100MB> <new-file: 100MB-insert-8MB> <0 for software, 1 for hardware acceleration>

Network Evaluation (Client-Server)

You can continue to run the HTTP server on one machine and the client on another. On the machine acting as the server (which holds the old file version), start the appropriate HTTP server.

# Start the HTTP server
./rsync_http_server

The server will listen on port 19876 by default. The available servers are rsync_http_server, dsync_http_server, skysync_f_http_server, skysync_c_http_server, parasync_http_server, and pdsync_http_server.

On the client machine (which holds the new file version), run the corresponding client to initiate sync. Note: The --basis_filename argument specifies the full path to the target file on the server.

# Start the HTTP client to sync files.
./rsync_http_client --basis_filename=<old_file> --new_filename=<new_file> --server_ip=<ip> --server_port=19876 --hw=<0 or 1>

The available clients are rsync_http_client, dsync_http_client, skysync_f_http_client, skysync_c_http_client, parasync_http_client, and pdsync_http_client.

For ParaSync, additional parameters are available:

./parasync_http_client --basis_filename=<old_file> --new_filename=<new_file> --server_ip=<ip> --server_port=19876 --hw=<0 or 1> --thread_num=<4> --mode=<3>
  • --thread_num: Number of threads for parallel processing (default: 4, recommended: 4-8)
  • --mode: Parallel matching variant (0=pipeline v1, 1=pipeline v3, 2=flow-graph v2, 3=streaming, default=3)

ParaSync HTTP Client/Server Detailed Guide

Overview

ParaSync provides HTTP-based client and server implementations for high-performance parallel file synchronization over networks. The implementation supports:

  • 4-Stage Parallel Pipeline: CDC → Weak Match → Strong Match → Delta
  • Four Parallel Matching Algorithms: Pipeline variants, flow-graph, and streaming
  • BLAKE3 Strong Hashing: 32-byte fixed-size hashes for exact verification
  • Absolute Offsets: Concurrent delta reconstruction with atomic writes
  • Multiple Modes: Hardware acceleration (hw=1) and software-only (hw=0)

HTTP Server Usage

Starting the Server

# Basic usage (default port 19876)
./parasync_http_server

# Custom port
./parasync_http_server --port=8080

The server will:

  1. Listen on the specified port for HTTP connections
  2. Handle three endpoints: /csums_queue, /ack, and /patch
  3. Perform parallel CDC and weak matching on the old file
  4. Reconstruct the new file from delta commands
  5. Create output file as <basis_filename>.new

Server Endpoints

1. /csums_queue (POST)

  • Purpose: Client sends CRC32 hashes of new file chunks
  • Query Parameters:
    • basis_filename: Path to old file on server
    • hw: Hardware acceleration mode (0=software, 1=hardware)
    • thread_num: Number of threads for parallel processing
    • mode: Parallel matching algorithm (0/1/2/3)
  • Response: BLAKE3 hashes of weak-matched chunks (protobuf)

2. /ack (POST)

  • Purpose: Client acknowledges receipt, server measures RTT
  • Body: Request key from previous /csums_queue response
  • Response: "ACK_RECV" confirmation

3. /patch (POST)

  • Purpose: Client sends delta commands to reconstruct new file
  • Query Parameters:
    • basis_filename: Path to old file on server
  • Body: Binary delta data (commands + literal data)
  • Response: "OK" on success

HTTP Client Usage

Command-Line Flags

Flag Type Default Description
--server_ip string 127.0.0.1 Server IP address
--server_port int 19876 Server port number
--basis_filename string required Path to old file on server
--new_filename string required Path to new file on client
--hw int 0 Hardware acceleration (0=SW, 1=HW)
--thread_num int 4 Number of parallel threads
--mode int 3 Matching algorithm variant

Parallel Matching Modes

Mode Name Description Use Case
0 Pipeline v1 Batch processing pipeline General purpose
1 Pipeline v3 Optimized with token-based parallelism Balanced performance
2 Flow Graph v2 TBB flow graph with task nodes Complex workloads
3 Streaming Continuous processing, memory-efficient DEFAULT, large files

Usage Examples

Basic Sync (Localhost)

./parasync_http_client \
  --server_ip=127.0.0.1 \
  --server_port=19876 \
  --basis_filename=/path/to/old/file \
  --new_filename=/path/to/new/file

Remote Sync with Custom Settings

./parasync_http_client \
  --server_ip=192.168.1.100 \
  --server_port=19876 \
  --basis_filename=/server/data/file.dat \
  --new_filename=./local/file.dat \
  --hw=0 \
  --thread_num=8 \
  --mode=3

Hardware Acceleration

./parasync_http_client \
  --server_ip=192.168.1.100 \
  --basis_filename=/data/file.dat \
  --new_filename=./file.dat \
  --hw=1 \
  --thread_num=4

Debug Mode

To enable verbose logging for troubleshooting:

# Set log level to DEBUG
export ALOG_LOG_LEVEL=0

# Run client with debug output
./parasync_http_client --server_ip=127.0.0.1 ... 2>&1 | tee debug.log

# Run server with debug output
./parasync_http_server --port=19876 2>&1 | tee server_debug.log

PDSync HTTP Client/Server Detailed Guide

Overview

PDSync provides HTTP-based client and server implementations for parallel file synchronization using relative offsets (sequential writes). This is a baseline implementation for comparing delta reconstruction strategies with ParaSync's absolute offset approach.

Key features:

  • 3-Stage Parallel Pipeline: CDC → Weak Match → Strong Match
  • BLAKE3 Strong Hashing: 32-byte fixed-size hashes for exact verification
  • Relative Offsets: Sequential delta reconstruction using write() and sendfile()
  • Single Matching Algorithm: One optimized parallel_smatcher() implementation

HTTP Server Usage

Starting the Server

# Basic usage (default port 19876)
./pdsync_http_server

# Custom port
./pdsync_http_server --port=8080

The server will:

  1. Listen on the specified port for HTTP connections
  2. Handle three endpoints: /csums_queue, /ack, and /patch
  3. Perform parallel CDC and weak matching on the old file
  4. Reconstruct the new file from delta commands using sequential writes
  5. Create output file as <basis_filename>.new

HTTP Client Usage

Command-Line Flags

Flag Type Default Description
-server_ip string 127.0.0.1 Server IP address
-server_port uint64 19876 Server port number
-basis_filename string required Path to old file on server
-new_filename string required Path to new file on client
-hw bool false Hardware acceleration
-thread_num uint32 4 Number of parallel threads

Usage Examples

Basic Sync (Localhost)

./pdsync_http_client \
  -server_ip=127.0.0.1 \
  -server_port=19876 \
  -basis_filename=/path/to/old/file \
  -new_filename=/path/to/new/file

Remote Sync with Custom Settings

./pdsync_http_client \
  -server_ip=192.168.1.100 \
  -server_port=19876 \
  -basis_filename=/server/data/file.dat \
  -new_filename=./local/file.dat \
  -hw=false \
  -thread_num=8

Hardware Acceleration

./pdsync_http_client \
  -server_ip=192.168.1.100 \
  -basis_filename=/data/file.dat \
  -new_filename=./file.dat \
  -hw=true \
  -thread_num=4

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors