ParaSync: Exploiting Fine-Grained Parallelism for Efficient File Synchronization
- gcc == 11.4.0
- cmake >= 3.18
sudo apt install git cmake autoconf pkg-config libtool libcurl4-openssl-dev libssl-dev libpopt-dev libbz2-dev libb2-dev doxygen nasm build-essential libaio-dev zlib1g-dev libext2fs-dev texinfo libevent-dev libev-dev libgflags-dev libprotobuf-dev libprotoc-dev protobuf-compiler libleveldb-dev libgoogle-perftools-dev hwloc libgtest-dev libgmock-dev libfuse-dev libgsasl7-dev
git clone https://github.com/nicexlab/parasync
cd parasync
git submodule update --init --recursive
sudo ./thirdparty/deps_install.shThe whole project is built using CMake. You can build it by running the following commands:
cd src/skysync-f && protoc -I=. --cpp_out=. skysync.proto && cd ../..
cd src/dsync && protoc -I=. --cpp_out=. dsync.proto && cd ../..
cd src/parasync && protoc -I=. --cpp_out=. parasync.proto && cd ../..
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j$(nproc)Upon successful compilation, all executables will be located in the build/ directory.
First, you can run the core logic of each algorithm on a single machine using the provided test executables including rsync_test, dsync_test, skysync_f_test and skysync_c_test. These tests measure the performance without network overhead.
# Arg 1: Path to the old/basis file
# Arg 2: Path to the new file
# Arg 3: 0 for software-only, 1 for hardware acceleration
./rsync_test <old-file: 100MB> <new-file: 100MB-insert-8MB> <0 for software, 1 for hardware acceleration>
./dsync_test <old-file: 100MB> <new-file: 100MB-insert-8MB> <0 for software, 1 for hardware acceleration>
./skysync_f_test <old-file: 100MB> <new-file: 100MB-insert-8MB> <0 for software, 1 for hardware acceleration>
./skysync_c_test <old-file: 100MB> <new-file: 100MB-insert-8MB> <0 for software, 1 for hardware acceleration>You can continue to run the HTTP server on one machine and the client on another. On the machine acting as the server (which holds the old file version), start the appropriate HTTP server.
# Start the HTTP server
./rsync_http_serverThe server will listen on port 19876 by default. The available servers are rsync_http_server, dsync_http_server, skysync_f_http_server, skysync_c_http_server, parasync_http_server, and pdsync_http_server.
On the client machine (which holds the new file version), run the corresponding client to initiate sync. Note: The --basis_filename argument specifies the full path to the target file on the server.
# Start the HTTP client to sync files.
./rsync_http_client --basis_filename=<old_file> --new_filename=<new_file> --server_ip=<ip> --server_port=19876 --hw=<0 or 1>The available clients are rsync_http_client, dsync_http_client, skysync_f_http_client, skysync_c_http_client, parasync_http_client, and pdsync_http_client.
For ParaSync, additional parameters are available:
./parasync_http_client --basis_filename=<old_file> --new_filename=<new_file> --server_ip=<ip> --server_port=19876 --hw=<0 or 1> --thread_num=<4> --mode=<3>--thread_num: Number of threads for parallel processing (default: 4, recommended: 4-8)--mode: Parallel matching variant (0=pipeline v1, 1=pipeline v3, 2=flow-graph v2, 3=streaming, default=3)
ParaSync provides HTTP-based client and server implementations for high-performance parallel file synchronization over networks. The implementation supports:
- 4-Stage Parallel Pipeline: CDC → Weak Match → Strong Match → Delta
- Four Parallel Matching Algorithms: Pipeline variants, flow-graph, and streaming
- BLAKE3 Strong Hashing: 32-byte fixed-size hashes for exact verification
- Absolute Offsets: Concurrent delta reconstruction with atomic writes
- Multiple Modes: Hardware acceleration (hw=1) and software-only (hw=0)
# Basic usage (default port 19876)
./parasync_http_server
# Custom port
./parasync_http_server --port=8080The server will:
- Listen on the specified port for HTTP connections
- Handle three endpoints:
/csums_queue,/ack, and/patch - Perform parallel CDC and weak matching on the old file
- Reconstruct the new file from delta commands
- Create output file as
<basis_filename>.new
1. /csums_queue (POST)
- Purpose: Client sends CRC32 hashes of new file chunks
- Query Parameters:
basis_filename: Path to old file on serverhw: Hardware acceleration mode (0=software, 1=hardware)thread_num: Number of threads for parallel processingmode: Parallel matching algorithm (0/1/2/3)
- Response: BLAKE3 hashes of weak-matched chunks (protobuf)
2. /ack (POST)
- Purpose: Client acknowledges receipt, server measures RTT
- Body: Request key from previous
/csums_queueresponse - Response: "ACK_RECV" confirmation
3. /patch (POST)
- Purpose: Client sends delta commands to reconstruct new file
- Query Parameters:
basis_filename: Path to old file on server
- Body: Binary delta data (commands + literal data)
- Response: "OK" on success
| Flag | Type | Default | Description |
|---|---|---|---|
--server_ip |
string | 127.0.0.1 | Server IP address |
--server_port |
int | 19876 | Server port number |
--basis_filename |
string | required | Path to old file on server |
--new_filename |
string | required | Path to new file on client |
--hw |
int | 0 | Hardware acceleration (0=SW, 1=HW) |
--thread_num |
int | 4 | Number of parallel threads |
--mode |
int | 3 | Matching algorithm variant |
| Mode | Name | Description | Use Case |
|---|---|---|---|
| 0 | Pipeline v1 | Batch processing pipeline | General purpose |
| 1 | Pipeline v3 | Optimized with token-based parallelism | Balanced performance |
| 2 | Flow Graph v2 | TBB flow graph with task nodes | Complex workloads |
| 3 | Streaming | Continuous processing, memory-efficient | DEFAULT, large files |
Basic Sync (Localhost)
./parasync_http_client \
--server_ip=127.0.0.1 \
--server_port=19876 \
--basis_filename=/path/to/old/file \
--new_filename=/path/to/new/fileRemote Sync with Custom Settings
./parasync_http_client \
--server_ip=192.168.1.100 \
--server_port=19876 \
--basis_filename=/server/data/file.dat \
--new_filename=./local/file.dat \
--hw=0 \
--thread_num=8 \
--mode=3Hardware Acceleration
./parasync_http_client \
--server_ip=192.168.1.100 \
--basis_filename=/data/file.dat \
--new_filename=./file.dat \
--hw=1 \
--thread_num=4To enable verbose logging for troubleshooting:
# Set log level to DEBUG
export ALOG_LOG_LEVEL=0
# Run client with debug output
./parasync_http_client --server_ip=127.0.0.1 ... 2>&1 | tee debug.log
# Run server with debug output
./parasync_http_server --port=19876 2>&1 | tee server_debug.logPDSync provides HTTP-based client and server implementations for parallel file synchronization using relative offsets (sequential writes). This is a baseline implementation for comparing delta reconstruction strategies with ParaSync's absolute offset approach.
Key features:
- 3-Stage Parallel Pipeline: CDC → Weak Match → Strong Match
- BLAKE3 Strong Hashing: 32-byte fixed-size hashes for exact verification
- Relative Offsets: Sequential delta reconstruction using
write()andsendfile() - Single Matching Algorithm: One optimized
parallel_smatcher()implementation
# Basic usage (default port 19876)
./pdsync_http_server
# Custom port
./pdsync_http_server --port=8080The server will:
- Listen on the specified port for HTTP connections
- Handle three endpoints:
/csums_queue,/ack, and/patch - Perform parallel CDC and weak matching on the old file
- Reconstruct the new file from delta commands using sequential writes
- Create output file as
<basis_filename>.new
| Flag | Type | Default | Description |
|---|---|---|---|
-server_ip |
string | 127.0.0.1 | Server IP address |
-server_port |
uint64 | 19876 | Server port number |
-basis_filename |
string | required | Path to old file on server |
-new_filename |
string | required | Path to new file on client |
-hw |
bool | false | Hardware acceleration |
-thread_num |
uint32 | 4 | Number of parallel threads |
Basic Sync (Localhost)
./pdsync_http_client \
-server_ip=127.0.0.1 \
-server_port=19876 \
-basis_filename=/path/to/old/file \
-new_filename=/path/to/new/fileRemote Sync with Custom Settings
./pdsync_http_client \
-server_ip=192.168.1.100 \
-server_port=19876 \
-basis_filename=/server/data/file.dat \
-new_filename=./local/file.dat \
-hw=false \
-thread_num=8Hardware Acceleration
./pdsync_http_client \
-server_ip=192.168.1.100 \
-basis_filename=/data/file.dat \
-new_filename=./file.dat \
-hw=true \
-thread_num=4