-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathreport.typ
More file actions
314 lines (204 loc) · 18.8 KB
/
report.typ
File metadata and controls
314 lines (204 loc) · 18.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
#set rect(
width: 100%,
height: 100%,
inset: 4pt,
)
#set page(
numbering: none,
)
#set page(
numbering: none,
)
#set text(
font: "Libertinus Serif",
size: 10.5pt,
)
#set heading(numbering: "1.1")
#align(center)[
#v(3cm)
#text(size: 20pt, weight: "bold")[
Patchwork:
A P2P Version Control System
]
#v(1cm)
#text(size: 14pt)[
Seminar Computer Science
New Trends for Local and Global Interconnects for P2P Applications
]
#v(2cm)
#text(size: 12pt)[
Robin Gökcen,
Boran Gökcen,
Istref Uka,
Benedikt Karli
University of Basel
]
#v(1cm)
#text(size: 12pt)[
January 2026
]
]
#pagebreak()
#set page(numbering: "1")
#set page(
header: align(right)[Patchwork: A P2P Version Control System],
numbering: "1"
)
#outline()
#pagebreak()
= Abstract
This project implements a peer-to-peer (P2P) version control prototype inspired by Git, written in Rust.
Instead of relying on a central server, repositories synchronize directly between peers over an encrypted networking layer built on iroh.
Local changes are represented as content-addressed objects: file contents are stored as blobs, directory states as trees, and snapshots as commits that reference a root tree and an optional parent commit.
Objects are hashed and thus enabling deduplication, integrity verification, and efficient transfer.
The system provides core workflows such as initialization, adding files to a staging area, committing, cloning a repository’s metadata, and exchanging updates via pull/push protocols.
To minimize bandwidth, synchronization is incremental: a peer first exchanges the latest commit hash, then transfers only missing commits and their dependent objects (trees and blobs) until a common ancestor is found.
The result is a lightweight, decentralized collaboration mechanism that demonstrates how Git-like data structures and P2P networking can be combined to support distributed development without centralized infrastructure.
#pagebreak()
= Introduction
Modern software development relies heavily on version control systems to track changes, collaborate efficiently, and maintain a reliable history of a project.
Most widely used systems today, such as Git hosting platforms, still depend on centralized infrastructure for discovery, authentication, and synchronization.
While central services simplify collaboration, they can also create single points of failure, introduce privacy concerns, and require users to trust and depend on third-party providers.
This project explores an alternative approach by building a peer-to-peer (P2P) version control prototype in Rust.
The goal is to enable direct repository synchronization between collaborators without requiring a central server.
To achieve this, the system implements a Git-inspired object model based on content addressing. File contents are stored as blobs, directory snapshots are represented as trees, and repository history is captured through commits that reference a root tree and an optional parent commit.
Each object is identified by a hash, which enables integrity checks and natural deduplication—identical content is stored and transferred only once.
For communication between peers, the prototype integrates iroh as the networking layer.
The repository can be initialized locally, files can be staged and committed, and peers can connect to exchange metadata and updates.
Synchronization is designed to be incremental and bandwidth-efficient: a pushing peer first announces its latest commit hash, and objects are transferred only when the receiving peer does not already possess them.
By iterating backwards through parent commits until a shared history point is found, the system avoids sending redundant data and transfers only the portion of history that differs.
Overall, the project demonstrates how Git-like data structures combined with P2P networking can form the foundation of decentralized collaboration.
The implementation focuses on correctness and a clear protocol structure, providing a basis for extending features such as conflict handling, branching, and richer synchronization strategies.
= System Design
Patchwork is designed as a lightweight peer-to-peer version control system that borrows the core data model of Git while removing the need for centralized hosting. Each participant runs a peer that stores repository objects locally and synchronizes directly with other peers over an encrypted networking layer built on iroh. The design is centered around two pillars:
- a content-addressed object database for immutable repository data, and
- a protocol-driven P2P layer for exchanging those objects incrementally.
== Repository Layout and Local State
A Patchwork repository is identified by the presence of a `.patch/` directory in the project root. Commands can be executed from any subdirectory inside the project because the CLI locates the repository by walking upwards until `.patch/` is found (`walk_backwards`).
Inside `.patch/`, Patchwork stores:
- `patchwork.toml`: persistent configuration and repository metadata.
- `objects/`: a content-addressed object store split into:
- `objects/blobs/` for file contents,
- `objects/trees/` for directory snapshots,
- `objects/commits/` for history snapshots.
The configuration file is the single source of truth for local state. It stores the peer identity (`secret`), the peer endpoint ID (`endpoint`), the list of known collaborators (`collaborators`), the staging map (`stage`), and the current head pointer (`last_commit`). This design keeps the working directory independent of versioning metadata while ensuring that repository history can be reconstructed from `.patch/` alone.
== Content-Addressed Object Model
All versioned data in Patchwork is represented as immutable objects addressed by a SHA-256 hash. Hash computation follows a Git-inspired convention: the hashed input begins with a small header containing the object type and the payload length, followed by a null byte and the raw payload:
- `"<type> <len>\\0" + <payload-bytes>`
This design yields three key properties:
- *Integrity*: objects can be verified by recomputing the hash.
- *Deduplication*: identical content maps to the same hash and is stored once.
- *Transfer efficiency*: peers can identify which objects are missing and transfer only those.
=== Blob Objects
A blob stores the raw content of a file. When adding a file, Patchwork reads the file bytes, hashes them with a `"blob"` header, and writes the blob to:
- `.patch/objects/blobs/<hash>`
If a blob file already exists, it is not rewritten, implementing deduplication at the storage level. The staging area records a mapping from relative path to blob hash.
=== Tree Objects
A tree encodes a directory snapshot as a set of entries. Entries reference either blobs (files) or other trees (subdirectories) using a line-based format:
- `blob <name> <hash>`
- `tree <name> <hash>`
Trees are built from a nested `HashMap<String, Entry>` structure and are written deterministically by sorting entry names. Deterministic serialization is essential so that independent peers create identical hashes for identical directory states. Each tree is stored at:
- `.patch/objects/trees/<hash>`
Tree reading is recursive: a tree can be loaded into an in-memory structure and traversed to reconstruct a snapshot.
=== Commit Objects
A commit references a root tree and optionally a parent commit, forming a parent-linked history. Commits also contain metadata (author, timestamp, message). The serialized format is a readable header followed by a blank line and the commit message:
- `tree <root-tree-hash>`
- `parent <parent-hash>` (optional)
- `author <name>`
- `time <unix-seconds>`
- blank line
- commit message
Commits are stored at:
- `.patch/objects/commits/<hash>`
The repository head pointer is `Config.last_commit`, which identifies the current snapshot of the repository state.
Together, parent references form a singly linked history that can be traversed efficiently.
== Local Workflows
Patchwork exposes a Git-like workflow through a small set of CLI commands. A repository is considered initialized as soon as a `.patch/` directory exists in the project root. All commands can be invoked from any subdirectory because the CLI first discovers the repository root by walking up the filesystem hierarchy until `.patch/` is found (`walk_backwards`). This makes the tool convenient to use in larger projects without requiring the user to change directories.
The local workflow is centered around four operations—initialization, staging, committing, and checkout—and is complemented by metadata bootstrapping (`clone`) and incremental synchronization (`pull`/`push`).
=== Initialization (`init`)
`patchwork init <name>` creates the repository metadata directory and initializes the local peer identity. Concretely, it:
- creates `.patch/` and the object store subdirectories:
- `.patch/objects/blobs/`
- `.patch/objects/trees/`
- `.patch/objects/commits/`
- generates a new iroh secret key and stores its raw bytes in the configuration (`secret`)
- derives a public endpoint identifier from the endpoint’s public key and stores it as `endpoint`
- writes the initial configuration to `.patch/patchwork.toml`
The configuration file acts as the persistent state of the repository. Besides identity and collaborator metadata, it stores the staging area and the current head pointer (`last_commit`). This design keeps the working directory free of internal metadata, while ensuring that the entire repository state can be reconstructed from `.patch/` alone.
=== Staging Files (`add`)
`patchwork add <paths...>` stages changes by converting file contents into blob objects and updating the staging map in the configuration.
For each path provided, Patchwork:
1. reads the file as raw bytes,
2. computes a content hash using a Git-inspired header: `"blob <len>\\0" + <file-bytes>`,
3. stores the blob at `.patch/objects/blobs/<hash>` if it does not already exist (deduplication),
4. records the staged update as a mapping from relative path to blob hash: `Config.stage[relative_path] = blob_hash`.
Relative paths are computed against the project root (the parent directory of `.patch/`). This ensures that snapshots are portable and can be checked out consistently on other machines. At the moment, staging supports files only; directories are rejected with an explicit message.
=== Creating Snapshots (`commit`)
`patchwork commit "<message>"` creates a new snapshot of the repository state. If the staging area is empty, the command terminates early ("nothing to commit"), mirroring Git’s behavior.
Commit creation proceeds in three steps:
1. *Base snapshot selection.*
If `last_commit` exists, Patchwork loads the parent commit and uses its root tree as the baseline snapshot. Otherwise, the baseline is an empty tree (first commit).
2. *Tree construction via overlay.*
The staging map is applied onto the baseline tree by inserting or overwriting the staged paths. This yields a new root tree representing the complete snapshot after the staged modifications. Tree objects are written deterministically (sorted entries) so that identical directory states lead to identical hashes on different peers.
3. *Commit object creation.*
A new commit is written that references the new root tree and the optional parent commit. The commit also stores the author name from the configuration, a UNIX timestamp, and the commit message. The commit is hashed and stored under `.patch/objects/commits/<hash>`.
After writing the commit, Patchwork updates `Config.last_commit` to the new commit hash and clears `Config.stage`, making the commit the new head of the repository.
This approach models commits as complete snapshots, while still benefiting from incremental storage: unchanged files and trees are reused by hash and are neither duplicated nor rewritten.
=== Restoring a Snapshot (`checkout`)
`patchwork checkout [commit_hash]` reconstructs the working directory from a commit snapshot. If no hash is provided, Patchwork uses the current head (`last_commit`). If the requested commit does not exist locally, the command fails without modifying the working directory.
Checkout performs:
1. read the commit object and extract its `tree <hash>` line,
2. recursively load the referenced root tree into an in-memory structure,
3. materialize the tree into the working directory by:
- creating subdirectories as needed,
- reading each referenced blob from `.patch/objects/blobs/<hash>`,
- writing or overwriting files at their snapshot paths.
The current implementation is conservative: it creates and overwrites files but does not delete files that are present in the working directory and absent from the target snapshot. As a result, checkout guarantees that all files tracked by the snapshot exist with the correct contents, but it does not guarantee a "clean" directory in the presence of extra untracked files.
=== Bootstrapping Collaboration (`clone`)
`patchwork clone <pub_key>` is used to join an existing collaboration setup. Unlike Git clone, Patchwork’s clone currently focuses on exchanging metadata rather than immediately transferring repository objects.
When cloning, Patchwork initializes a local `.patch/` structure and then connects to the remote peer via iroh using the provided public identifier. The remote sends a *redacted configuration* that includes only collaboration-relevant metadata (e.g., peer name and collaborator list) and excludes secrets. The client acknowledges with `ACK` and stores the received collaborator information locally. This step enables discovery of peers and subsequent synchronization via pull/push.
== Incremental Synchronization via Push and Pull
Patchwork synchronizes repositories using an incremental strategy based on commit ancestry. Instead of transferring entire snapshots, peers first identify the divergence point by exchanging commit hashes and walking backwards through parents until a shared ancestor is found. Only the missing part of the history is then transferred.
This approach leverages content addressing: if a commit, tree, or blob already exists locally, it does not need to be transferred again.
=== Divergence Detection (yay/nay)
Both push and pull use a small control exchange:
- The sender first transmits its tip commit hash (32 bytes).
- The receiver replies with:
- `yay` if it already has that commit object, or
- `nay` if it does not.
If the receiver answers `nay`, the sender continues by sending parent commit hashes one by one. The receiver replies `yay` when it recognizes a parent; that parent is the common ancestor. If the sender reaches the root, it sends a zero marker `[0; 32]`.
The number of `nay` replies corresponds to the number of missing commits that must be transferred.
=== Object Transfer Unit: Commit + Trees + Blobs
Once the set of missing commits is determined, transfer proceeds commit-by-commit. For each commit, the sender transfers:
- the commit object bytes,
- the serialized set of tree objects required to reconstruct the commit's root tree (including subtrees),
- all referenced blobs.
Trees are serialized in a stream as repeated blocks where each block begins with a tree hash line, followed by that tree's entries. The receiver reconstructs the tree files and collects the referenced blob hashes. Blobs are then requested and transferred by hash, allowing the receiver to fetch exactly what it needs.
=== Pull
In the pull direction, the remote peer acts as the sender of missing history and objects. The local peer performs divergence detection and then receives commits, trees, and blobs. Each received commit and tree is written into the local object store, and blobs are stored under `objects/blobs/`.
=== Push
In the push direction, the local peer acts as the sender. The receiver runs the push protocol handler, performs the same divergence detection, and then accepts commits, trees, and blobs. After successful transfer, the receiver updates `last_commit` to the pushed tip and persists the configuration.
== Design Rationale
Patchwork intentionally adopts Git-like immutability and content addressing because it simplifies both storage and synchronization:
- immutable objects are naturally cacheable and easy to verify,
- hashes provide a compact way to test set membership (do we have this object?),
- incremental synchronization becomes a matter of exchanging identifiers and only transferring missing objects.
The protocol architecture mirrors this: clone is metadata-only bootstrapping, handshake is liveness checking, and pull/push synchronize by exchanging a minimal amount of control messages followed by object payload streams.
== Current Limitations and Extension Points
The current design provides a working baseline but leaves several features open:
- Checkout reconstructs but does not remove deleted paths.
- Limited validation and robustness in error handling.
- Branching does not exist yet.
- Merge commits and conflict handling are not implemented.
- Object transfer uses a simple textual tree encoding; future work could adopt more compact packing and streaming strategies for larger repositories.
= Experience with Iroh
In the seminar, we examined several alternative technologies related to the networking stack of applications. Among these, Iroh stood out as the most compelling option. It appeared straightforward to set up, and although it is peer-to-peer under the hood, from an application development perspective it does not significantly differ from more traditional networking technologies.
While working on our project, we were able to confirm that Iroh is highly versatile. Its API is straightforward and, in our assessment, well designed. The available API documentation is also extensive and generally of high quality. However, there were occasional instances where documentation links had changed or content was no longer available. This is likely due to the project being under active development.
The primary challenges we encountered were largely related to the Rust programming language, as for most members of the team this was our first experience working with Rust.
= Summary
The implementation combines a deterministic Git-like object store with a minimal, protocol-driven P2P synchronization layer. Storage is based on immutable content-addressed objects that can be verified and deduplicated. Networking uses iroh endpoints with ALPN-based routing to separate protocol concerns (clone, pull, push, handshake). Synchronization is incremental by design: peers exchange commit identifiers first, then transfer only missing commits and their dependent trees/blobs, minimizing redundant bandwidth and enabling decentralized collaboration without central infrastructure.
= Additional Information
The source code of Patchwork is available in the public GitHub repository:
https://github.com/bkarli/patchwork.
Large language models were used to improve grammar and stylistic writing in the report.
*Instructions for installing and using Patchwork*, including example commands and further notes, are provided in the project’s `README.md`.