ByTech.Raft

A .NET 10 implementation of the Raft consensus protocol built around a pure effect-driven core. All protocol logic lives in a single deterministic, synchronous function that takes an input and returns an ordered list of effects and events — with zero I/O, zero async, and zero side effects inside the core. The hosting layer executes those effects against pluggable storage, transport, and telemetry backends.

This separation makes the core fully unit-testable without mocks and enables deterministic simulation of multi-node clusters.

What is Raft?

Raft is a consensus algorithm for managing a replicated log across a cluster of servers. It guarantees that all servers agree on the same sequence of state transitions, even when some servers crash or network partitions occur.

The key insight behind Raft is decomposing consensus into three independent subproblems:

Leader Election — Raft uses randomized election timeouts to elect a single leader per term. A term is a logical clock that increases monotonically. Every server starts as a Follower. If a follower receives no heartbeat before its election timeout fires, it becomes a Candidate, increments its term, votes for itself, and requests votes from all peers. A candidate that receives votes from a majority becomes the Leader. At most one leader exists per term (election safety).

Log Replication — The leader accepts client commands and appends them to its log. It then replicates entries to followers via AppendEntries RPCs. When an entry is stored on a majority of servers, it is committed and safe to apply to the state machine. Raft guarantees that committed entries are never lost (leader completeness) and that if two logs contain an entry with the same index and term, all preceding entries are identical (log matching).

Safety — Raft maintains five safety properties:

Election Safety: at most one leader per term
Leader Append-Only: a leader never overwrites or deletes entries in its own log
Log Matching: if two entries in different logs have the same index and term, the logs are identical up to that point
Leader Completeness: if an entry is committed in a given term, it will be present in the logs of all leaders in higher terms
State Machine Safety: if a server has applied a log entry at a given index, no other server will ever apply a different entry for that index

Beyond the core algorithm, Raft also specifies:

Snapshotting — log compaction via state machine snapshots, with snapshot transfer to lagging followers
Membership changes — safe cluster reconfiguration via joint consensus (two-phase config transitions)
Linearizable reads — read-only queries that are verified against the quorum before being served, ensuring clients always see the latest committed state

How this library solves consensus challenges

Pure effect-driven core

The central architectural property is that the protocol engine (RaftNodeState.Process) is a pure function:

RaftInput  -->  [ RaftNodeState.Process ]  -->  RaftStepResult { Effects, Events }

Input: a single protocol event — a timer tick, a received message, a client command, or a persistence completion signal
Output: an ordered list of effects (persist metadata, append log entries, send a message, apply committed entries, etc.) and protocol events (role changed, leader elected, commit advanced, etc.)

The core never performs I/O, never calls async, never logs, and never depends on external services. This means:

Deterministic testing — feed inputs, assert effects. No mocks, no timing sensitivity.
Simulation — run entire multi-node clusters in a single thread with deterministic message delivery and fault injection.
Property testing — randomise inputs and verify protocol invariants hold across thousands of scenarios.
Separation of concerns — the hosting layer handles real I/O, the core handles protocol correctness.

Pluggable architecture

Every infrastructure dependency is behind a contract interface:

Concern	Interface	What you provide
Log persistence	`IRaftLogStore`	Durable ordered log with conflict repair
Meta persistence	`IRaftMetaStore`	Durable term + votedFor (must fsync)
Snapshots	`IRaftSnapshotStore`	Snapshot save/load with atomic promotion
Transport	`IRaftTransport`	Send/receive typed Raft messages between nodes
State machine	`IApplicationStateMachine`	Your application logic: apply commands, export/import snapshots
Peer directory	`IRaftPeerDirectory`	Map logical node IDs to network endpoints
Telemetry	`IRaftEventSink`, `IRaftMetricSink`	Structured events and metrics

The library ships production-ready implementations for all of these, but every component can be swapped independently.

Single event loop per node

All protocol state mutation is serialised on one async event loop per node. No locks are needed inside the core. State transitions are easy to reason about, and crash/restart recovery is straightforward: restore persistent state, start the event loop, and resume.

Project structure

Project	Layer	Purpose
`Raft.Abstractions`	Contracts	All public types: messages, effects, events, inputs, storage/transport/telemetry interfaces
`Raft.Core`	Protocol core	`RaftNodeState` — the deterministic state machine implementing `IRaftCore.Process`
`Raft.Hosting`	Runtime	`RaftNode` — async event loop, mailbox, timers, effect executor; `RaftNodeBuilder`; `CommandResult`
`Raft.Storage.Memory`	Storage	In-memory `IRaftLogStore`, `IRaftMetaStore`, `IRaftSnapshotStore` (volatile, for testing)
`Raft.Storage.Wal`	Storage	`WalLogStore` (append-only WAL with CRC + fsync) and `DurableMetaStore` (atomic file writes)
`Raft.Storage.Snapshots.File`	Storage	`FileSnapshotStore` — file-backed snapshots with temp + fsync + atomic rename
`Raft.Transport.InMemory`	Transport	Zero-copy in-process routing with network partition fault injection
`Raft.Transport.Tcp`	Transport	TCP transport with lazy connect, reconnect, framed messages, TLS, and Raft handshake
`Raft.Transport.Codecs`	Transport	`BinaryRaftCodec` — wire serialisation for all 6 Raft message types
`Raft.Telemetry.Abstractions`	Telemetry	Extended sink interfaces with batch and dimensional tag support
`Raft.Telemetry.NoOp`	Telemetry	No-op sinks + capturing sinks for test assertions
`Raft.Telemetry.OpenTelemetry`	Telemetry	Production OTel adapters: `OpenTelemetryEventSink`, `OpenTelemetryMetricSink`
`Raft.Extensions.DependencyInjection`	Hosting	`AddRaftNode` extension for `IServiceCollection`; `RaftNodeHostedService`
`Raft.Extensions.AspNetCore`	Hosting	HTTP endpoints: `GET /raft/status`, `GET /raft/log`, `POST /raft/command`
`Raft.Harness`	Testing	Deterministic simulation cluster with invariant checking and fault injection
`Raft.Benchmarks`	Perf	BenchmarkDotNet: core throughput, in-memory cluster, TCP cluster, WAL cluster

Quick start

Run the in-memory demo

dotnet run --project samples/Raft.Demo.InMemory

Demonstrates leader election, log replication, client command submission, follower redirect, and state queries — all in-process with no files or network.

Run the TCP demo

dotnet run --project samples/Raft.Demo.Tcp

Demonstrates a 3-node cluster over localhost TCP with WAL-backed durable storage, wire codec serialisation, and cluster restart with WAL recovery.

Run the Docker dashboard demo

cd samples/Raft.Demo.Docker
docker compose up --build

Then open http://localhost:8080 to see the live Blazor dashboard.

This is the most complete sample — a 3-node Raft cluster running as Docker containers with a fourth container providing a YARP reverse proxy, Blazor Server dashboard, and cluster control plane:

Distributed cache API — PUT/GET/DELETE /cache/{key} through the YARP gateway, which routes writes to the leader and reads round-robin across healthy nodes
Live cluster dashboard — real-time node status cards (color-coded by role: green=Leader, blue=Follower, orange=Candidate, red=Offline), term and commit progress, event log
Cache explorer — browse, add, and delete key-value entries through the UI
Cluster control plane — stop, start, kill, and restart individual node containers via buttons in the UI or REST API (POST /api/cluster/nodes/{id}/stop|start|kill|restart)

Try fault tolerance live:

# Stop the current leader — watch re-election happen in the dashboard
curl -X POST http://localhost:8080/api/cluster/nodes/node2/stop

# Write through the new leader — YARP detects the leadership change
curl -X PUT http://localhost:8080/cache/hello -d "world"

# Restart the stopped node — it rejoins as follower and catches up
curl -X POST http://localhost:8080/api/cluster/nodes/node2/start

The dashboard uses Docker socket access to manage containers and polls each node's /status endpoint every 500ms to detect role changes, elections, and commit progress.

Cluster sizing

Raft requires a majority (quorum) of nodes to make progress. The library supports any odd cluster size, but practical deployments typically use 3, 5, or 7 nodes.

Cluster size	Quorum	Tolerates	Typical use case
3	2	1 failure	Development, small services, edge deployments
5	3	2 failures	Production systems, databases, coordination services
7	4	3 failures	High-availability critical infrastructure
9	5	4 failures	Extreme fault tolerance (diminishing returns)

What changes with more nodes

Election: More nodes means more votes needed for quorum. With good network conditions and randomised election timeouts, elections typically complete in 1-2 rounds regardless of cluster size. Split votes are rare with properly tuned timeouts.

Replication latency: The leader waits for a majority of acknowledgements before committing. With 5 nodes, it waits for 3 (itself + 2 fastest followers); with 7 nodes, it waits for 4. Commit latency is determined by the (N/2+1)-th fastest follower, not the slowest — slow or partitioned nodes don't block progress.

Heartbeat overhead: The leader sends heartbeats to every follower. With 7 nodes this is 6 heartbeat RPCs per interval — negligible for typical heartbeat intervals (50-150ms). The protocol is leader-centric: followers only respond, never initiate replication traffic.

Snapshot transfers: When a new or lagging node joins, the leader sends a snapshot. Larger clusters make it slightly more likely that one node is behind, but snapshots are only sent when a follower's log has been compacted past the point it needs.

Recommendations

Start with 3 for development and most services
Use 5 when you need to tolerate 2 simultaneous failures (rolling upgrades, zone failures)
Use 7 only when your SLA requires tolerating 3 failures — the extra replication cost is measurable
Avoid 9+ unless you have a specific reason — the marginal availability gain is small, and write latency increases because the leader must wait for more acknowledgements

Tested configurations

All cluster sizes from 3 to 9 are covered by automated tests:

Size	Simulation	Hosted (real timers)	TCP (real sockets)
3	election, partition, crash, replication, step-down regression	election, partition, step-down, re-election	election, replication, re-election, minority failure
5	election+replication, minority failure (2 crash), partition, crash+restart	election, partition minority (2 down)	election, replication, minority failure (2 stop)
7	election+replication, minority failure (3 crash), partition, crash+restart	election, partition minority (3 down)	election + replication
9	election+replication, minority failure (4 crash), election safety	election + command commit	—

Usage

1. Implement your state machine

The only code you must write is your application's state machine. It receives committed commands and manages snapshots:

public class KeyValueStore : IApplicationStateMachine
{
    private readonly Dictionary<string, string> _data = new();

    public Task<ReadOnlyMemory<byte>> ApplyAsync(
        LogIndex index,
        ReadOnlyMemory<byte> command,
        ClientCommandIdentity? identity = null,
        CancellationToken ct = default)
    {
        // Decode command, mutate state, return result
        var cmd = JsonSerializer.Deserialize<KvCommand>(command.Span);

        switch (cmd.Op)
        {
            case "SET":
                _data[cmd.Key] = cmd.Value;
                return Task.FromResult<ReadOnlyMemory<byte>>("OK"u8.ToArray());

            case "GET":
                var found = _data.TryGetValue(cmd.Key, out var val);
                var result = found ? Encoding.UTF8.GetBytes(val) : "NOT_FOUND"u8.ToArray();
                return Task.FromResult<ReadOnlyMemory<byte>>(result);

            default:
                return Task.FromResult<ReadOnlyMemory<byte>>("ERR"u8.ToArray());
        }
    }

    public Task<Stream> ExportSnapshotAsync(CancellationToken ct = default)
    {
        var ms = new MemoryStream();
        JsonSerializer.Serialize(ms, _data);
        ms.Position = 0;
        return Task.FromResult<Stream>(ms);
    }

    public async Task ImportSnapshotAsync(Stream payloadStream, CancellationToken ct = default)
    {
        _data.Clear();
        var snapshot = await JsonSerializer.DeserializeAsync<Dictionary<string, string>>(payloadStream, cancellationToken: ct);
        if (snapshot is not null)
            foreach (var (k, v) in snapshot)
                _data[k] = v;
    }
}

2. Build and start a node

using Raft.Abstractions;
using Raft.Abstractions.Transport;
using Raft.Hosting;
using Raft.Transport.Tcp;
using Raft.Transport.Codecs;

var nodeId    = new RaftNodeId("node-1");
var peer2     = new RaftNodeId("node-2");
var peer3     = new RaftNodeId("node-3");
var peers     = new[] { peer2, peer3 };

var endpoints = new List<RaftPeerEndpoint>
{
    new(peer2, "10.0.0.2", 7000),
    new(peer3, "10.0.0.3", 7000),
};

var codec     = new BinaryRaftCodec();
var transport = new TcpRaftTransport(nodeId, listenPort: 7000, codec);
var peerDir   = new StaticPeerDirectory(endpoints);

// For first-time creation (fresh cluster, no prior state):
var node = new RaftNodeBuilder()
    .WithNodeId(nodeId)
    .WithPeers(peers)
    .WithWalStorage("/var/lib/raft")        // durable WAL + meta + snapshots
    .WithTransport(transport)
    .WithPeerDirectory(peerDir)
    .WithStateMachine(new KeyValueStore())
    .Build();

// For restart (recovers persisted term, vote, log, and snapshot boundary):
var node = await new RaftNodeBuilder()
    .WithNodeId(nodeId)
    .WithPeers(peers)
    .WithWalStorage("/var/lib/raft")
    .WithTransport(transport)
    .WithPeerDirectory(peerDir)
    .WithStateMachine(new KeyValueStore())
    .BuildAsync();

await node.StartAsync();

3. Submit commands

var payload = JsonSerializer.SerializeToUtf8Bytes(new KvCommand("SET", "user:1", "Alice"));

var result = await node.SubmitCommandAsync(payload);

switch (result)
{
    case CommandResult.Committed c:
        Console.WriteLine($"Committed at index {c.Index}: {Encoding.UTF8.GetString(c.ResultPayload.Span)}");
        break;

    case CommandResult.Redirected r:
        Console.WriteLine($"Not the leader. Redirect to {r.LeaderId} at {r.LeaderEndpoint}");
        break;

    case CommandResult.TimedOut:
        Console.WriteLine("Timed out waiting for commit");
        break;

    case CommandResult.Cancelled:
        Console.WriteLine("Request cancelled");
        break;
}

4. Stop gracefully

// DisposeAsync stops the node and releases all resources (WAL file handles, TCP listeners)
await node.DisposeAsync();

Configuration

RaftNodeOptions

var options = new RaftNodeOptions
{
    ElectionTimeoutMinMs  = 150,   // min election timeout (randomised between min and max)
    ElectionTimeoutMaxMs  = 300,   // max election timeout
    HeartbeatIntervalMs   = 50,    // leader heartbeat frequency
    MailboxCapacity       = 4096,  // event queue bounded capacity
    ClientRequestTimeoutMs = 5000, // default SubmitCommandAsync timeout
    ReadOnlyQuorumTimeoutMs = 100, // linearizable read quorum verification timeout
    SnapshotThreshold     = 1000,  // log entries before automatic snapshot (0 = disabled)
};

var node = new RaftNodeBuilder()
    .WithNodeId(nodeId)
    .WithPeers(peers)
    .WithOptions(options)
    // ... other dependencies ...
    .Build();

Tuning guidelines:

HeartbeatIntervalMs must be significantly less than ElectionTimeoutMinMs (typically 3-10x less)
Widen the election timeout range on high-latency networks to reduce split votes
Increase SnapshotThreshold for write-heavy workloads to amortise snapshot cost
Set SnapshotThreshold = 0 to disable automatic snapshots entirely

TCP transport options

// Basic TCP transport
var transport = new TcpRaftTransport(nodeId, listenPort: 7000, codec);

// With TLS (mutual authentication)
var tlsOptions = new RaftTlsOptions(
    LocalCertificate: myCert,
    RequireClientCertificate: true,
    RemoteCertificateValidationCallback: (sender, cert, chain, errors) => /* validate */,
    ServerName: "raft-cluster.internal"     // SNI override (optional)
);

var transport = new TcpRaftTransport(nodeId, 7000, codec, tlsOptions: tlsOptions);

// With Raft handshake (cluster identity + peer validation)
var handshakeOptions = new RaftHandshakeOptions(
    ClusterId: "prod-cluster-01",
    IsKnownPeer: peerId => allowedPeers.Contains(peerId),
    HandshakeTimeoutMs: 5_000
);

var transport = new TcpRaftTransport(nodeId, 7000, codec,
    handshakeOptions: handshakeOptions, tlsOptions: tlsOptions);

TLS and handshake are both opt-in. When both are enabled, TLS negotiation happens first, then the Raft handshake runs over the encrypted stream.

Storage options

In-memory (testing / development)

builder.WithInMemoryStorage();  // volatile — data lost on restart

Durable WAL (production)

builder.WithWalStorage("/var/lib/raft");
// Creates per-node subdirectory: /var/lib/raft/{nodeId}/
// Contains: WAL segments (log), meta file (term + votedFor), snapshot files

The WAL store uses:

Append-only log segments with per-entry CRC-32 integrity checks
fsync after every append for durability
Atomic file rename for meta and snapshot persistence (crash-safe)

Custom storage

Implement any combination of IRaftMetaStore, IRaftLogStore, and IRaftSnapshotStore individually:

builder
    .WithMetaStore(myMetaStore)
    .WithLogStore(myLogStore)
    .WithSnapshotStore(mySnapshotStore);

Telemetry

Protocol events

All protocol state transitions emit structured RaftProtocolEvent records through IRaftEventSink:

Role transitions: RoleChanged, TermChanged
Elections: ElectionStarted, VoteGranted, VoteRejected, LeaderElected
Replication: HeartbeatCycleStarted, AppendRejectedDueToMismatch, CommitAdvanced
Snapshots: SnapshotTriggered, SnapshotSaved, SnapshotInstalled, FollowerSnapshotRequired
Membership: ConfigurationChangeStarted, ConfigurationChanged, ConfigurationFinalized
Client: ClientRedirected, ReadOnlyQuorumVerificationStarted, ReadOnlyQuorumVerificationCompleted
Transport: SendFailed (target unreachable, includes message type and error detail)
Diagnostics: DiagnosticEvent

Metrics

Standard metric names are defined in RaftMetricNames:

Metric	Type	Meaning
`raft.leader_elections`	Counter	Total elections started
`raft.commit_index`	Gauge	Current commit index
`raft.follower_snapshots_required`	Counter	Snapshot transfers triggered for lagging followers
`raft.snapshots_saved`	Counter	Local snapshots saved
`raft.snapshots_installed`	Counter	Remote snapshots installed
`raft.configuration_changes`	Counter	Membership changes applied
`raft.send_failures`	Counter	Transport send failures (peer unreachable)
`raft.mailbox_depth`	Gauge	Event loop queue depth

Wiring telemetry

// No-op (default when not specified)
builder.WithEventSink(NoOpEventSink.Instance);
builder.WithMetricSink(NoOpMetricSink.Instance);

// Capturing (for test assertions)
var events = new CapturingEventSink();
var metrics = new CapturingMetricSink();
builder.WithEventSink(events).WithMetricSink(metrics);
// After test: events.Events, metrics.Counters, metrics.Gauges

// OpenTelemetry (production)
builder.WithEventSink(new OpenTelemetryEventSink(activitySource));
builder.WithMetricSink(new OpenTelemetryMetricSink(meter));

ASP.NET Core integration

Dependency injection

// In Program.cs or Startup
services.AddRaftNode(builder => builder
    .WithNodeId(nodeId)
    .WithPeers(peers)
    .WithWalStorage(dataDir)
    .WithTransport(transport)
    .WithPeerDirectory(peerDir)
    .WithStateMachine(new KeyValueStore()));

// RaftNode is registered as singleton + IHostedService
// Automatically started and stopped with the generic host lifecycle

HTTP endpoints

app.MapRaftEndpoints("/raft");

Endpoint	Method	Description
`/raft/status`	GET	Node status: NodeId, Role, Term, CommitIndex, LeaderId
`/raft/log?limit=N`	GET	Log info: first/last index, commit index, recent entries
`/raft/command`	POST	Submit command; returns Committed, Redirected, TimedOut, or Cancelled

Observability via RaftNode

// Point-in-time diagnostics (thread-safe, no event-loop contention)
var role     = node.CurrentRole;      // Follower | Candidate | Leader
var term     = node.CurrentTerm;      // Current Raft term
var commit   = node.CommitIndex;      // Highest committed log index
var leaderId = node.CurrentLeaderId;  // Known leader (null if unknown)
var leaderEp = node.CurrentLeaderEndpoint;  // Leader's network endpoint

// Log inspection
var logInfo = node.GetLogInfo(recentEntryLimit: 20);
// logInfo.FirstRetainedIndex, logInfo.LastIndex, logInfo.CommitIndex
// logInfo.RecentEntries — list of { Index, Term, Kind, PayloadSizeBytes }

Testing

The library ships with comprehensive testing infrastructure across seven test projects (245 tests total):

Unit tests (124 tests)

Core state machine correctness: elections, replication, log management, snapshot logic, conflict repair, commitment rules, membership changes.

Integration tests (56 tests)

End-to-end multi-node RaftNode clusters with real async event loops (3/5/7/9 nodes). TCP transport tests with handshake validation, TLS mutual auth, and 5/7-node TCP clusters. Partition tolerance, minority failure, and leader step-down regression tests.

Simulation tests (25 tests)

Deterministic harness scenarios: network partitions, crash-restart, snapshotting, membership changes, 5/7/9-node clusters. Invariant checking at every step.

Fault injection tests (24 tests)

Injected transport failures, persistence errors (meta, log, snapshot), and recovery paths. Exercises crash windows and verifies the node recovers correctly.

Property tests (8 tests)

Randomised core inputs with invariant checks across thousands of scenarios: no-two-leaders-per-term, commit-never-retracts, snapshot-boundary-monotone, LastApplied-never-exceeds-CommitIndex.

DI tests (5 tests)

AddRaftNode registration, IHostedService lifecycle, singleton identity.

ASP.NET Core tests (3 tests)

HTTP endpoint tests via TestServer: status, log, command submission.

Using the simulation harness

// Deterministic 5-node cluster (no async, no real timers)
var cluster = SimulationCluster.Create(nodeCount: 5);

// Advance time until a leader is elected
cluster.AdvanceUntilLeaderElected(maxTicks: 1000);

// Submit a command through the leader
var leader = cluster.FindLeader();
cluster.ProposeCommand(leader.NodeId, commandPayload);
cluster.DeliverAllMessages();

// Inject a network partition
cluster.Router.SetPartition(node1, node2);
cluster.Router.SetPartition(node1, node3);

// Verify all protocol invariants still hold
cluster.Invariants.CheckAll();

// Crash and restart a node
cluster.CrashNode(node2);
cluster.RestartNode(node2);
cluster.DeliverAllMessages();
cluster.Invariants.CheckAll();

Built-in invariant checks:

Election Safety — at most one leader per term
Leader Append-Only — leader log never shrinks
Log Matching — matching index + term implies identical prefix
Leader Completeness — committed entries survive leader changes
State Machine Safety — no two nodes apply different entries at the same index

Benchmarks

dotnet run --project src/Raft.Benchmarks -c Release

Benchmark	What it measures
`CoreBenchmarks`	Pure `RaftNodeState.Process` throughput (no I/O, no hosting)
`InMemoryClusterBenchmarks`	3-node command-to-commit with in-memory everything
`TcpClusterBenchmarks`	3-node over localhost TCP with `BinaryRaftCodec`
`WalClusterBenchmarks`	3-node with WAL-backed durable storage

Core protocol types

Messages

Six sealed record types inheriting from RaftMessage:

Type	Direction	Purpose
`RequestVoteRequest`	Candidate -> All	Request vote during election
`RequestVoteResponse`	Peer -> Candidate	Grant or deny vote
`AppendEntriesRequest`	Leader -> Follower	Replicate log entries / heartbeat
`AppendEntriesResponse`	Follower -> Leader	Confirm replication / report conflict
`InstallSnapshotRequest`	Leader -> Follower	Transfer snapshot to lagging follower
`InstallSnapshotResponse`	Follower -> Leader	Acknowledge snapshot receipt

Inputs (`RaftInput` variants)

Protocol inputs fed into RaftNodeState.Process:

Timers: Tick, ElectionTimeoutElapsed, HeartbeatTimeoutElapsed
Messages: RequestVoteReceived, RequestVoteResponseReceived, AppendEntriesReceived, AppendEntriesResponseReceived, InstallSnapshotReceived, InstallSnapshotResponseReceived
Client: ClientCommandReceived, ReadOnlyRequestReceived
Persistence signals: MetaPersisted, MetaPersistFailed, LogAppended, LogAppendFailed, SnapshotSaved, SnapshotSaveFailed, SnapshotInstallFailed
Configuration: NodeConfigurationCommand

Effects (`RaftEffect` variants)

Side effects emitted by the core for the hosting layer to execute:

Persistence: PersistMeta, AppendLogEntries, TruncateLogSuffix, CompactLogPrefix
Messaging: SendMessage
Snapshots: SaveSnapshot, InstallSnapshot, SendSnapshot
Apply: ApplyCommittedEntries
Client responses: RedirectClient, RespondToClient, ConfirmReadOnly
Telemetry: PublishEvent, PublishMetric

Value types

Type	Description
`RaftNodeId`	Logical node identity (string-backed, comparable)
`Term`	Election term (long-backed, monotonically increasing)
`LogIndex`	Log entry index (long-backed, prevents confusion with Term)
`LogEntry`	Index + Term + Kind + Payload + optional CorrelationId and CommandIdentity
`RaftMeta`	Persistent election state: CurrentTerm + VotedFor
`SnapshotMetadata`	Snapshot boundary: LastIncludedIndex + LastIncludedTerm + configuration + size
`ClusterConfiguration`	Current members + optional new members (for joint consensus transitions)
`ClientCommandIdentity`	Session ID + serial number for exactly-once deduplication
`CorrelationId`	Guid-backed trace identifier for end-to-end observability
`RaftPeerEndpoint`	NodeId + Host + Port
`TransportSendResult`	Outcome + optional CorrelationId + detail + elapsed time

Design principles

Pure effect-driven core — Protocol core emits side effects as data structures. No direct I/O, async, logging, or dependency calls. Enables deterministic testing, simulation, and property-based verification.

Single event loop per node — All protocol state mutation serialised on one loop. No locks inside the core. Easier to reason about state transitions and crash recovery.

Message-level transport — Transport works with typed RaftMessage objects, not raw bytes. Codecs are factored separately. Easy to swap TCP for in-memory without touching protocol logic.

Split storage contracts — Separate interfaces for meta, log, and snapshots. Different durability requirements, write patterns, and failure modes. Easier to test crash windows and benchmark each layer independently.

Structured observability — Protocol events and metrics flow through sink abstractions. Testing sinks capture for assertions, production sinks forward to OpenTelemetry. Fully decoupled from the core.

Fault injection first — In-memory transport supports network partitions, message drops, and reordering. Simulation harness with fault injection and invariant checking built in from the start.

Version

Current: v1.3.0 — 13 milestones delivered, 245 tests passing.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
samples		samples
src		src
tests		tests
tools		tools
.editorconfig		.editorconfig
.gitignore		.gitignore
ByTech.Raft.slnx		ByTech.Raft.slnx
LICENSE.MD		LICENSE.MD
README.MD		README.MD
clean.ps1		clean.ps1

Folders and files

Latest commit

History

Repository files navigation

ByTech.Raft

What is Raft?

How this library solves consensus challenges

Pure effect-driven core

Pluggable architecture

Single event loop per node

Project structure

Quick start

Run the in-memory demo

Run the TCP demo

Run the Docker dashboard demo

Cluster sizing

What changes with more nodes

Recommendations

Tested configurations

Usage

1. Implement your state machine

2. Build and start a node

3. Submit commands

4. Stop gracefully

Configuration

RaftNodeOptions

TCP transport options

Storage options

In-memory (testing / development)

Durable WAL (production)

Custom storage

Telemetry

Protocol events

Metrics

Wiring telemetry

ASP.NET Core integration

Dependency injection

HTTP endpoints

Observability via RaftNode

Testing

Unit tests (124 tests)

Integration tests (56 tests)

Simulation tests (25 tests)

Fault injection tests (24 tests)

Property tests (8 tests)

DI tests (5 tests)

ASP.NET Core tests (3 tests)

Using the simulation harness

Benchmarks

Core protocol types

Messages

Inputs (RaftInput variants)

Effects (RaftEffect variants)

Value types

Design principles

Version

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Inputs (`RaftInput` variants)

Effects (`RaftEffect` variants)

Packages