Skip to content

bbajt/csharp-raft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ByTech.Raft

NuGet NuGet Downloads .NET C%23 Tests License

A .NET 10 implementation of the Raft consensus protocol built around a pure effect-driven core. All protocol logic lives in a single deterministic, synchronous function that takes an input and returns an ordered list of effects and events — with zero I/O, zero async, and zero side effects inside the core. The hosting layer executes those effects against pluggable storage, transport, and telemetry backends.

This separation makes the core fully unit-testable without mocks and enables deterministic simulation of multi-node clusters.


What is Raft?

Raft is a consensus algorithm for managing a replicated log across a cluster of servers. It guarantees that all servers agree on the same sequence of state transitions, even when some servers crash or network partitions occur.

The key insight behind Raft is decomposing consensus into three independent subproblems:

Leader Election — Raft uses randomized election timeouts to elect a single leader per term. A term is a logical clock that increases monotonically. Every server starts as a Follower. If a follower receives no heartbeat before its election timeout fires, it becomes a Candidate, increments its term, votes for itself, and requests votes from all peers. A candidate that receives votes from a majority becomes the Leader. At most one leader exists per term (election safety).

Log Replication — The leader accepts client commands and appends them to its log. It then replicates entries to followers via AppendEntries RPCs. When an entry is stored on a majority of servers, it is committed and safe to apply to the state machine. Raft guarantees that committed entries are never lost (leader completeness) and that if two logs contain an entry with the same index and term, all preceding entries are identical (log matching).

Safety — Raft maintains five safety properties:

  • Election Safety: at most one leader per term
  • Leader Append-Only: a leader never overwrites or deletes entries in its own log
  • Log Matching: if two entries in different logs have the same index and term, the logs are identical up to that point
  • Leader Completeness: if an entry is committed in a given term, it will be present in the logs of all leaders in higher terms
  • State Machine Safety: if a server has applied a log entry at a given index, no other server will ever apply a different entry for that index

Beyond the core algorithm, Raft also specifies:

  • Snapshotting — log compaction via state machine snapshots, with snapshot transfer to lagging followers
  • Membership changes — safe cluster reconfiguration via joint consensus (two-phase config transitions)
  • Linearizable reads — read-only queries that are verified against the quorum before being served, ensuring clients always see the latest committed state

How this library solves consensus challenges

Pure effect-driven core

The central architectural property is that the protocol engine (RaftNodeState.Process) is a pure function:

RaftInput  -->  [ RaftNodeState.Process ]  -->  RaftStepResult { Effects, Events }
  • Input: a single protocol event — a timer tick, a received message, a client command, or a persistence completion signal
  • Output: an ordered list of effects (persist metadata, append log entries, send a message, apply committed entries, etc.) and protocol events (role changed, leader elected, commit advanced, etc.)

The core never performs I/O, never calls async, never logs, and never depends on external services. This means:

  1. Deterministic testing — feed inputs, assert effects. No mocks, no timing sensitivity.
  2. Simulation — run entire multi-node clusters in a single thread with deterministic message delivery and fault injection.
  3. Property testing — randomise inputs and verify protocol invariants hold across thousands of scenarios.
  4. Separation of concerns — the hosting layer handles real I/O, the core handles protocol correctness.

Pluggable architecture

Every infrastructure dependency is behind a contract interface:

Concern Interface What you provide
Log persistence IRaftLogStore Durable ordered log with conflict repair
Meta persistence IRaftMetaStore Durable term + votedFor (must fsync)
Snapshots IRaftSnapshotStore Snapshot save/load with atomic promotion
Transport IRaftTransport Send/receive typed Raft messages between nodes
State machine IApplicationStateMachine Your application logic: apply commands, export/import snapshots
Peer directory IRaftPeerDirectory Map logical node IDs to network endpoints
Telemetry IRaftEventSink, IRaftMetricSink Structured events and metrics

The library ships production-ready implementations for all of these, but every component can be swapped independently.

Single event loop per node

All protocol state mutation is serialised on one async event loop per node. No locks are needed inside the core. State transitions are easy to reason about, and crash/restart recovery is straightforward: restore persistent state, start the event loop, and resume.


Project structure

Project Layer Purpose
Raft.Abstractions Contracts All public types: messages, effects, events, inputs, storage/transport/telemetry interfaces
Raft.Core Protocol core RaftNodeState — the deterministic state machine implementing IRaftCore.Process
Raft.Hosting Runtime RaftNode — async event loop, mailbox, timers, effect executor; RaftNodeBuilder; CommandResult
Raft.Storage.Memory Storage In-memory IRaftLogStore, IRaftMetaStore, IRaftSnapshotStore (volatile, for testing)
Raft.Storage.Wal Storage WalLogStore (append-only WAL with CRC + fsync) and DurableMetaStore (atomic file writes)
Raft.Storage.Snapshots.File Storage FileSnapshotStore — file-backed snapshots with temp + fsync + atomic rename
Raft.Transport.InMemory Transport Zero-copy in-process routing with network partition fault injection
Raft.Transport.Tcp Transport TCP transport with lazy connect, reconnect, framed messages, TLS, and Raft handshake
Raft.Transport.Codecs Transport BinaryRaftCodec — wire serialisation for all 6 Raft message types
Raft.Telemetry.Abstractions Telemetry Extended sink interfaces with batch and dimensional tag support
Raft.Telemetry.NoOp Telemetry No-op sinks + capturing sinks for test assertions
Raft.Telemetry.OpenTelemetry Telemetry Production OTel adapters: OpenTelemetryEventSink, OpenTelemetryMetricSink
Raft.Extensions.DependencyInjection Hosting AddRaftNode extension for IServiceCollection; RaftNodeHostedService
Raft.Extensions.AspNetCore Hosting HTTP endpoints: GET /raft/status, GET /raft/log, POST /raft/command
Raft.Harness Testing Deterministic simulation cluster with invariant checking and fault injection
Raft.Benchmarks Perf BenchmarkDotNet: core throughput, in-memory cluster, TCP cluster, WAL cluster

Quick start

Run the in-memory demo

dotnet run --project samples/Raft.Demo.InMemory

Demonstrates leader election, log replication, client command submission, follower redirect, and state queries — all in-process with no files or network.

Run the TCP demo

dotnet run --project samples/Raft.Demo.Tcp

Demonstrates a 3-node cluster over localhost TCP with WAL-backed durable storage, wire codec serialisation, and cluster restart with WAL recovery.

Run the Docker dashboard demo

cd samples/Raft.Demo.Docker
docker compose up --build

Then open http://localhost:8080 to see the live Blazor dashboard.

This is the most complete sample — a 3-node Raft cluster running as Docker containers with a fourth container providing a YARP reverse proxy, Blazor Server dashboard, and cluster control plane:

  • Distributed cache APIPUT/GET/DELETE /cache/{key} through the YARP gateway, which routes writes to the leader and reads round-robin across healthy nodes
  • Live cluster dashboard — real-time node status cards (color-coded by role: green=Leader, blue=Follower, orange=Candidate, red=Offline), term and commit progress, event log
  • Cache explorer — browse, add, and delete key-value entries through the UI
  • Cluster control plane — stop, start, kill, and restart individual node containers via buttons in the UI or REST API (POST /api/cluster/nodes/{id}/stop|start|kill|restart)

Try fault tolerance live:

# Stop the current leader — watch re-election happen in the dashboard
curl -X POST http://localhost:8080/api/cluster/nodes/node2/stop

# Write through the new leader — YARP detects the leadership change
curl -X PUT http://localhost:8080/cache/hello -d "world"

# Restart the stopped node — it rejoins as follower and catches up
curl -X POST http://localhost:8080/api/cluster/nodes/node2/start

The dashboard uses Docker socket access to manage containers and polls each node's /status endpoint every 500ms to detect role changes, elections, and commit progress.


Cluster sizing

Raft requires a majority (quorum) of nodes to make progress. The library supports any odd cluster size, but practical deployments typically use 3, 5, or 7 nodes.

Cluster size Quorum Tolerates Typical use case
3 2 1 failure Development, small services, edge deployments
5 3 2 failures Production systems, databases, coordination services
7 4 3 failures High-availability critical infrastructure
9 5 4 failures Extreme fault tolerance (diminishing returns)

What changes with more nodes

Election: More nodes means more votes needed for quorum. With good network conditions and randomised election timeouts, elections typically complete in 1-2 rounds regardless of cluster size. Split votes are rare with properly tuned timeouts.

Replication latency: The leader waits for a majority of acknowledgements before committing. With 5 nodes, it waits for 3 (itself + 2 fastest followers); with 7 nodes, it waits for 4. Commit latency is determined by the (N/2+1)-th fastest follower, not the slowest — slow or partitioned nodes don't block progress.

Heartbeat overhead: The leader sends heartbeats to every follower. With 7 nodes this is 6 heartbeat RPCs per interval — negligible for typical heartbeat intervals (50-150ms). The protocol is leader-centric: followers only respond, never initiate replication traffic.

Snapshot transfers: When a new or lagging node joins, the leader sends a snapshot. Larger clusters make it slightly more likely that one node is behind, but snapshots are only sent when a follower's log has been compacted past the point it needs.

Recommendations

  • Start with 3 for development and most services
  • Use 5 when you need to tolerate 2 simultaneous failures (rolling upgrades, zone failures)
  • Use 7 only when your SLA requires tolerating 3 failures — the extra replication cost is measurable
  • Avoid 9+ unless you have a specific reason — the marginal availability gain is small, and write latency increases because the leader must wait for more acknowledgements

Tested configurations

All cluster sizes from 3 to 9 are covered by automated tests:

Size Simulation Hosted (real timers) TCP (real sockets)
3 election, partition, crash, replication, step-down regression election, partition, step-down, re-election election, replication, re-election, minority failure
5 election+replication, minority failure (2 crash), partition, crash+restart election, partition minority (2 down) election, replication, minority failure (2 stop)
7 election+replication, minority failure (3 crash), partition, crash+restart election, partition minority (3 down) election + replication
9 election+replication, minority failure (4 crash), election safety election + command commit

Usage

1. Implement your state machine

The only code you must write is your application's state machine. It receives committed commands and manages snapshots:

public class KeyValueStore : IApplicationStateMachine
{
    private readonly Dictionary<string, string> _data = new();

    public Task<ReadOnlyMemory<byte>> ApplyAsync(
        LogIndex index,
        ReadOnlyMemory<byte> command,
        ClientCommandIdentity? identity = null,
        CancellationToken ct = default)
    {
        // Decode command, mutate state, return result
        var cmd = JsonSerializer.Deserialize<KvCommand>(command.Span);

        switch (cmd.Op)
        {
            case "SET":
                _data[cmd.Key] = cmd.Value;
                return Task.FromResult<ReadOnlyMemory<byte>>("OK"u8.ToArray());

            case "GET":
                var found = _data.TryGetValue(cmd.Key, out var val);
                var result = found ? Encoding.UTF8.GetBytes(val) : "NOT_FOUND"u8.ToArray();
                return Task.FromResult<ReadOnlyMemory<byte>>(result);

            default:
                return Task.FromResult<ReadOnlyMemory<byte>>("ERR"u8.ToArray());
        }
    }

    public Task<Stream> ExportSnapshotAsync(CancellationToken ct = default)
    {
        var ms = new MemoryStream();
        JsonSerializer.Serialize(ms, _data);
        ms.Position = 0;
        return Task.FromResult<Stream>(ms);
    }

    public async Task ImportSnapshotAsync(Stream payloadStream, CancellationToken ct = default)
    {
        _data.Clear();
        var snapshot = await JsonSerializer.DeserializeAsync<Dictionary<string, string>>(payloadStream, cancellationToken: ct);
        if (snapshot is not null)
            foreach (var (k, v) in snapshot)
                _data[k] = v;
    }
}

2. Build and start a node

using Raft.Abstractions;
using Raft.Abstractions.Transport;
using Raft.Hosting;
using Raft.Transport.Tcp;
using Raft.Transport.Codecs;

var nodeId    = new RaftNodeId("node-1");
var peer2     = new RaftNodeId("node-2");
var peer3     = new RaftNodeId("node-3");
var peers     = new[] { peer2, peer3 };

var endpoints = new List<RaftPeerEndpoint>
{
    new(peer2, "10.0.0.2", 7000),
    new(peer3, "10.0.0.3", 7000),
};

var codec     = new BinaryRaftCodec();
var transport = new TcpRaftTransport(nodeId, listenPort: 7000, codec);
var peerDir   = new StaticPeerDirectory(endpoints);

// For first-time creation (fresh cluster, no prior state):
var node = new RaftNodeBuilder()
    .WithNodeId(nodeId)
    .WithPeers(peers)
    .WithWalStorage("/var/lib/raft")        // durable WAL + meta + snapshots
    .WithTransport(transport)
    .WithPeerDirectory(peerDir)
    .WithStateMachine(new KeyValueStore())
    .Build();

// For restart (recovers persisted term, vote, log, and snapshot boundary):
var node = await new RaftNodeBuilder()
    .WithNodeId(nodeId)
    .WithPeers(peers)
    .WithWalStorage("/var/lib/raft")
    .WithTransport(transport)
    .WithPeerDirectory(peerDir)
    .WithStateMachine(new KeyValueStore())
    .BuildAsync();

await node.StartAsync();

3. Submit commands

var payload = JsonSerializer.SerializeToUtf8Bytes(new KvCommand("SET", "user:1", "Alice"));

var result = await node.SubmitCommandAsync(payload);

switch (result)
{
    case CommandResult.Committed c:
        Console.WriteLine($"Committed at index {c.Index}: {Encoding.UTF8.GetString(c.ResultPayload.Span)}");
        break;

    case CommandResult.Redirected r:
        Console.WriteLine($"Not the leader. Redirect to {r.LeaderId} at {r.LeaderEndpoint}");
        break;

    case CommandResult.TimedOut:
        Console.WriteLine("Timed out waiting for commit");
        break;

    case CommandResult.Cancelled:
        Console.WriteLine("Request cancelled");
        break;
}

4. Stop gracefully

// DisposeAsync stops the node and releases all resources (WAL file handles, TCP listeners)
await node.DisposeAsync();

Configuration

RaftNodeOptions

var options = new RaftNodeOptions
{
    ElectionTimeoutMinMs  = 150,   // min election timeout (randomised between min and max)
    ElectionTimeoutMaxMs  = 300,   // max election timeout
    HeartbeatIntervalMs   = 50,    // leader heartbeat frequency
    MailboxCapacity       = 4096,  // event queue bounded capacity
    ClientRequestTimeoutMs = 5000, // default SubmitCommandAsync timeout
    ReadOnlyQuorumTimeoutMs = 100, // linearizable read quorum verification timeout
    SnapshotThreshold     = 1000,  // log entries before automatic snapshot (0 = disabled)
};

var node = new RaftNodeBuilder()
    .WithNodeId(nodeId)
    .WithPeers(peers)
    .WithOptions(options)
    // ... other dependencies ...
    .Build();

Tuning guidelines:

  • HeartbeatIntervalMs must be significantly less than ElectionTimeoutMinMs (typically 3-10x less)
  • Widen the election timeout range on high-latency networks to reduce split votes
  • Increase SnapshotThreshold for write-heavy workloads to amortise snapshot cost
  • Set SnapshotThreshold = 0 to disable automatic snapshots entirely

TCP transport options

// Basic TCP transport
var transport = new TcpRaftTransport(nodeId, listenPort: 7000, codec);

// With TLS (mutual authentication)
var tlsOptions = new RaftTlsOptions(
    LocalCertificate: myCert,
    RequireClientCertificate: true,
    RemoteCertificateValidationCallback: (sender, cert, chain, errors) => /* validate */,
    ServerName: "raft-cluster.internal"     // SNI override (optional)
);

var transport = new TcpRaftTransport(nodeId, 7000, codec, tlsOptions: tlsOptions);

// With Raft handshake (cluster identity + peer validation)
var handshakeOptions = new RaftHandshakeOptions(
    ClusterId: "prod-cluster-01",
    IsKnownPeer: peerId => allowedPeers.Contains(peerId),
    HandshakeTimeoutMs: 5_000
);

var transport = new TcpRaftTransport(nodeId, 7000, codec,
    handshakeOptions: handshakeOptions, tlsOptions: tlsOptions);

TLS and handshake are both opt-in. When both are enabled, TLS negotiation happens first, then the Raft handshake runs over the encrypted stream.


Storage options

In-memory (testing / development)

builder.WithInMemoryStorage();  // volatile — data lost on restart

Durable WAL (production)

builder.WithWalStorage("/var/lib/raft");
// Creates per-node subdirectory: /var/lib/raft/{nodeId}/
// Contains: WAL segments (log), meta file (term + votedFor), snapshot files

The WAL store uses:

  • Append-only log segments with per-entry CRC-32 integrity checks
  • fsync after every append for durability
  • Atomic file rename for meta and snapshot persistence (crash-safe)

Custom storage

Implement any combination of IRaftMetaStore, IRaftLogStore, and IRaftSnapshotStore individually:

builder
    .WithMetaStore(myMetaStore)
    .WithLogStore(myLogStore)
    .WithSnapshotStore(mySnapshotStore);

Telemetry

Protocol events

All protocol state transitions emit structured RaftProtocolEvent records through IRaftEventSink:

  • Role transitions: RoleChanged, TermChanged
  • Elections: ElectionStarted, VoteGranted, VoteRejected, LeaderElected
  • Replication: HeartbeatCycleStarted, AppendRejectedDueToMismatch, CommitAdvanced
  • Snapshots: SnapshotTriggered, SnapshotSaved, SnapshotInstalled, FollowerSnapshotRequired
  • Membership: ConfigurationChangeStarted, ConfigurationChanged, ConfigurationFinalized
  • Client: ClientRedirected, ReadOnlyQuorumVerificationStarted, ReadOnlyQuorumVerificationCompleted
  • Transport: SendFailed (target unreachable, includes message type and error detail)
  • Diagnostics: DiagnosticEvent

Metrics

Standard metric names are defined in RaftMetricNames:

Metric Type Meaning
raft.leader_elections Counter Total elections started
raft.commit_index Gauge Current commit index
raft.follower_snapshots_required Counter Snapshot transfers triggered for lagging followers
raft.snapshots_saved Counter Local snapshots saved
raft.snapshots_installed Counter Remote snapshots installed
raft.configuration_changes Counter Membership changes applied
raft.send_failures Counter Transport send failures (peer unreachable)
raft.mailbox_depth Gauge Event loop queue depth

Wiring telemetry

// No-op (default when not specified)
builder.WithEventSink(NoOpEventSink.Instance);
builder.WithMetricSink(NoOpMetricSink.Instance);

// Capturing (for test assertions)
var events = new CapturingEventSink();
var metrics = new CapturingMetricSink();
builder.WithEventSink(events).WithMetricSink(metrics);
// After test: events.Events, metrics.Counters, metrics.Gauges

// OpenTelemetry (production)
builder.WithEventSink(new OpenTelemetryEventSink(activitySource));
builder.WithMetricSink(new OpenTelemetryMetricSink(meter));

ASP.NET Core integration

Dependency injection

// In Program.cs or Startup
services.AddRaftNode(builder => builder
    .WithNodeId(nodeId)
    .WithPeers(peers)
    .WithWalStorage(dataDir)
    .WithTransport(transport)
    .WithPeerDirectory(peerDir)
    .WithStateMachine(new KeyValueStore()));

// RaftNode is registered as singleton + IHostedService
// Automatically started and stopped with the generic host lifecycle

HTTP endpoints

app.MapRaftEndpoints("/raft");
Endpoint Method Description
/raft/status GET Node status: NodeId, Role, Term, CommitIndex, LeaderId
/raft/log?limit=N GET Log info: first/last index, commit index, recent entries
/raft/command POST Submit command; returns Committed, Redirected, TimedOut, or Cancelled

Observability via RaftNode

// Point-in-time diagnostics (thread-safe, no event-loop contention)
var role     = node.CurrentRole;      // Follower | Candidate | Leader
var term     = node.CurrentTerm;      // Current Raft term
var commit   = node.CommitIndex;      // Highest committed log index
var leaderId = node.CurrentLeaderId;  // Known leader (null if unknown)
var leaderEp = node.CurrentLeaderEndpoint;  // Leader's network endpoint

// Log inspection
var logInfo = node.GetLogInfo(recentEntryLimit: 20);
// logInfo.FirstRetainedIndex, logInfo.LastIndex, logInfo.CommitIndex
// logInfo.RecentEntries — list of { Index, Term, Kind, PayloadSizeBytes }

Testing

The library ships with comprehensive testing infrastructure across seven test projects (245 tests total):

Unit tests (124 tests)

Core state machine correctness: elections, replication, log management, snapshot logic, conflict repair, commitment rules, membership changes.

Integration tests (56 tests)

End-to-end multi-node RaftNode clusters with real async event loops (3/5/7/9 nodes). TCP transport tests with handshake validation, TLS mutual auth, and 5/7-node TCP clusters. Partition tolerance, minority failure, and leader step-down regression tests.

Simulation tests (25 tests)

Deterministic harness scenarios: network partitions, crash-restart, snapshotting, membership changes, 5/7/9-node clusters. Invariant checking at every step.

Fault injection tests (24 tests)

Injected transport failures, persistence errors (meta, log, snapshot), and recovery paths. Exercises crash windows and verifies the node recovers correctly.

Property tests (8 tests)

Randomised core inputs with invariant checks across thousands of scenarios: no-two-leaders-per-term, commit-never-retracts, snapshot-boundary-monotone, LastApplied-never-exceeds-CommitIndex.

DI tests (5 tests)

AddRaftNode registration, IHostedService lifecycle, singleton identity.

ASP.NET Core tests (3 tests)

HTTP endpoint tests via TestServer: status, log, command submission.

Using the simulation harness

// Deterministic 5-node cluster (no async, no real timers)
var cluster = SimulationCluster.Create(nodeCount: 5);

// Advance time until a leader is elected
cluster.AdvanceUntilLeaderElected(maxTicks: 1000);

// Submit a command through the leader
var leader = cluster.FindLeader();
cluster.ProposeCommand(leader.NodeId, commandPayload);
cluster.DeliverAllMessages();

// Inject a network partition
cluster.Router.SetPartition(node1, node2);
cluster.Router.SetPartition(node1, node3);

// Verify all protocol invariants still hold
cluster.Invariants.CheckAll();

// Crash and restart a node
cluster.CrashNode(node2);
cluster.RestartNode(node2);
cluster.DeliverAllMessages();
cluster.Invariants.CheckAll();

Built-in invariant checks:

  • Election Safety — at most one leader per term
  • Leader Append-Only — leader log never shrinks
  • Log Matching — matching index + term implies identical prefix
  • Leader Completeness — committed entries survive leader changes
  • State Machine Safety — no two nodes apply different entries at the same index

Benchmarks

dotnet run --project src/Raft.Benchmarks -c Release
Benchmark What it measures
CoreBenchmarks Pure RaftNodeState.Process throughput (no I/O, no hosting)
InMemoryClusterBenchmarks 3-node command-to-commit with in-memory everything
TcpClusterBenchmarks 3-node over localhost TCP with BinaryRaftCodec
WalClusterBenchmarks 3-node with WAL-backed durable storage

Core protocol types

Messages

Six sealed record types inheriting from RaftMessage:

Type Direction Purpose
RequestVoteRequest Candidate -> All Request vote during election
RequestVoteResponse Peer -> Candidate Grant or deny vote
AppendEntriesRequest Leader -> Follower Replicate log entries / heartbeat
AppendEntriesResponse Follower -> Leader Confirm replication / report conflict
InstallSnapshotRequest Leader -> Follower Transfer snapshot to lagging follower
InstallSnapshotResponse Follower -> Leader Acknowledge snapshot receipt

Inputs (RaftInput variants)

Protocol inputs fed into RaftNodeState.Process:

  • Timers: Tick, ElectionTimeoutElapsed, HeartbeatTimeoutElapsed
  • Messages: RequestVoteReceived, RequestVoteResponseReceived, AppendEntriesReceived, AppendEntriesResponseReceived, InstallSnapshotReceived, InstallSnapshotResponseReceived
  • Client: ClientCommandReceived, ReadOnlyRequestReceived
  • Persistence signals: MetaPersisted, MetaPersistFailed, LogAppended, LogAppendFailed, SnapshotSaved, SnapshotSaveFailed, SnapshotInstallFailed
  • Configuration: NodeConfigurationCommand

Effects (RaftEffect variants)

Side effects emitted by the core for the hosting layer to execute:

  • Persistence: PersistMeta, AppendLogEntries, TruncateLogSuffix, CompactLogPrefix
  • Messaging: SendMessage
  • Snapshots: SaveSnapshot, InstallSnapshot, SendSnapshot
  • Apply: ApplyCommittedEntries
  • Client responses: RedirectClient, RespondToClient, ConfirmReadOnly
  • Telemetry: PublishEvent, PublishMetric

Value types

Type Description
RaftNodeId Logical node identity (string-backed, comparable)
Term Election term (long-backed, monotonically increasing)
LogIndex Log entry index (long-backed, prevents confusion with Term)
LogEntry Index + Term + Kind + Payload + optional CorrelationId and CommandIdentity
RaftMeta Persistent election state: CurrentTerm + VotedFor
SnapshotMetadata Snapshot boundary: LastIncludedIndex + LastIncludedTerm + configuration + size
ClusterConfiguration Current members + optional new members (for joint consensus transitions)
ClientCommandIdentity Session ID + serial number for exactly-once deduplication
CorrelationId Guid-backed trace identifier for end-to-end observability
RaftPeerEndpoint NodeId + Host + Port
TransportSendResult Outcome + optional CorrelationId + detail + elapsed time

Design principles

Pure effect-driven core — Protocol core emits side effects as data structures. No direct I/O, async, logging, or dependency calls. Enables deterministic testing, simulation, and property-based verification.

Single event loop per node — All protocol state mutation serialised on one loop. No locks inside the core. Easier to reason about state transitions and crash recovery.

Message-level transport — Transport works with typed RaftMessage objects, not raw bytes. Codecs are factored separately. Easy to swap TCP for in-memory without touching protocol logic.

Split storage contracts — Separate interfaces for meta, log, and snapshots. Different durability requirements, write patterns, and failure modes. Easier to test crash windows and benchmark each layer independently.

Structured observability — Protocol events and metrics flow through sink abstractions. Testing sinks capture for assertions, production sinks forward to OpenTelemetry. Fully decoupled from the core.

Fault injection first — In-memory transport supports network partitions, message drops, and reordering. Simulation harness with fault injection and invariant checking built in from the start.


Version

Current: v1.3.0 — 13 milestones delivered, 245 tests passing.


License

Copyright 2026 ByTech. Licensed under the Apache License, Version 2.0.

About

A C# .NET 10 implementation of the Raft consensus protocol built around a pure effect-driven core.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages