Skip to content

Latest commit

 

History

History
749 lines (530 loc) · 50.6 KB

File metadata and controls

749 lines (530 loc) · 50.6 KB

IMPROVEMENTS.md

Detailed improvement plan for each visualization topic. Covers new config options, additional scenarios, failure cases, and new subtopics to add.


1. Indexing

1.1 Bitcask

Current state: Writes, reads, file rotation, compaction. Config: maxFileSize, initialKeys.

New config options:

  • compactionStrategy (select): "merge-all" vs "merge-oldest-two" to show different compaction approaches
  • tombstoneEnabled (toggle): Show delete operations as tombstone markers instead of just overwrites. Demonstrates how deletes work in append-only systems
  • crashRecovery (toggle): Enable a crash scenario mid-write, then show recovery by replaying the data files to rebuild the keydir

New scenarios:

  • Delete with tombstones: Write a key, then delete it. Show the tombstone entry appended to the active file and keydir removal. During compaction, show how the tombstone and the original entry are both discarded
  • Crash recovery: Simulate a crash after appending to the data file but before updating the keydir. On recovery, scan all data files sequentially to rebuild the in-memory keydir. Highlight how this is O(n) in the number of entries, which is Bitcask's main limitation
  • Hint files: After compaction, generate a "hint file" alongside the data file that contains only (key, offset, file_id) tuples, allowing faster restart without scanning all data

Failure cases to add:

  • Crash mid-write (partial entry in data file): show CRC validation detecting corruption
  • Disk full during compaction: old files must be retained until compaction completes successfully
  • Large keydir exceeding RAM: show what happens when the key count grows beyond available memory (the fundamental Bitcask limitation)

1.2 Hash Map

Current state: Insert 10 keys, collision chaining, resize/rehash, lookups. Config: numBuckets, loadFactorThreshold.

New config options:

  • collisionStrategy (select): "Separate Chaining" vs "Open Addressing (Linear Probing)" vs "Open Addressing (Robin Hood)". Currently only chaining is shown
  • hashFunction (select): "Simple Modular" vs "Murmur-like" to demonstrate how hash function quality affects distribution
  • deleteEnabled (toggle): Show deletion with chaining (remove from linked list) vs open addressing (tombstone marker)

New scenarios:

  • Linear probing: Show probing sequence when slots are occupied. Demonstrate clustering effect where occupied slots tend to cluster together, degrading performance
  • Robin Hood hashing: Show steal from rich (short probe distance) to give to poor (long probe distance). Display probe distances per entry to illustrate the variance reduction
  • Worst-case lookup: Craft a scenario where many keys hash to the same bucket, creating a long chain. Show O(n) lookup degradation
  • Delete + lookup interaction: In open addressing, show why you can't just empty a slot (it would break probe chains) and need tombstones instead

Failure cases to add:

  • Pathological hash function: All keys map to the same bucket, turning O(1) into O(n)
  • Resize under memory pressure: Show the temporary 2x memory spike during rehashing
  • Concurrent access hazard: Two threads inserting simultaneously cause a lost entry (conceptual, not animated in real-time, but show the interleaving)

1.3 LSM Tree

Current state: Writes to memtable, flush to L0, cascading compaction, reads across levels. Config: memtableSize, levelSizeMultiplier, numLevels.

New config options:

  • compactionStrategy (select): "Size-Tiered" vs "Leveled" to show the two major compaction strategies. Currently only leveled is shown
  • bloomFilterEnabled (toggle): Show Bloom filter checks before SSTable reads to skip levels that definitely don't contain a key
  • writeAheadLog (toggle): Show WAL writes before memtable writes for crash recovery

New scenarios:

  • Size-tiered compaction: Multiple SSTables per level, compaction triggers when count exceeds threshold. Contrast with leveled where each level has one sorted run
  • Bloom filter optimization: During reads, show the Bloom filter check per SSTable. Display false positive rate and how it reduces disk reads. Show a false positive case where the Bloom filter says "maybe" but the key isn't actually there
  • Range query: Instead of point lookups, show a range scan that must merge results across all levels using a merge iterator. Highlight the complexity of maintaining sorted order across levels
  • WAL recovery: Write to WAL, then memtable. Crash and lose memtable. Replay WAL to recover memtable contents
  • Space amplification: Show how multiple versions of the same key exist across levels until compaction cleans them up. Display the space amplification ratio

Failure cases to add:

  • Write stall: Memtable fills faster than compaction can drain L0, causing back-pressure
  • Read amplification measurement: Count the number of SSTable reads per lookup across different tree depths
  • Crash before flush: Memtable data lost without WAL

1.4 B-Tree

Current state: Insertions (prime numbers), node splits, search. Config: order, initialKeys.

New config options:

  • deleteEnabled (toggle): Show B-tree deletion with merge and redistribution
  • customKeys (text input or preset select): "Sequential", "Random", "Reverse" to show how insertion order affects tree shape during construction
  • treeVariant (select): "B-Tree" vs "B+ Tree" to show the difference (B+ tree stores values only in leaves, internal nodes are pure routing)

New scenarios:

  • Deletion: Remove a key from a leaf. If the node underflows (fewer than ceil(order/2)-1 keys), show redistribution from a sibling or merge with a sibling. Show cascading merges up to the root
  • B+ Tree variant: Internal nodes contain only keys for routing. Leaf nodes form a linked list for range scans. Show a range query traversing the leaf chain without going back to internal nodes
  • Bulk loading: Instead of inserting one key at a time, show the bottom-up bulk loading algorithm that sorts keys first and builds leaves, then constructs internal nodes. Much more efficient than repeated single inserts
  • Sequential vs random insert: Show how sequential inserts create a right-leaning tree with many splits on the rightmost path, while random inserts distribute splits more evenly
  • Disk page mapping: Show how each node corresponds to a disk page (typically 4KB). Visualize how higher order means fewer levels and fewer disk seeks

Failure cases to add:

  • Node underflow during deletion triggering merge cascade
  • Root demotion: When the root has only one child after a merge, remove the root level (tree shrinks)
  • Write-ahead logging for B-tree pages: Show how a page split requires WAL to avoid corruption on crash

1.5 New Subtopic: Skip List

Rationale: Skip lists are used as the memtable data structure in many LSM-tree implementations (LevelDB, RocksDB). They provide O(log n) search, insert, and delete with simpler implementation than balanced trees.

Config options:

  • maxLevel (slider, 2-6): Maximum number of levels
  • promotionProbability (slider, 0.25-0.75): Probability of promoting a node to the next level
  • numKeys (slider, 5-15): Number of keys to insert

Scenarios:

  • Insert keys and show probabilistic level promotion
  • Search showing how higher levels skip large portions of the list
  • Delete and pointer updates
  • Compare with linked list search (O(n)) to highlight the speedup

1.6 New Subtopic: SSTable / Sorted String Table

Rationale: SSTables are the on-disk format used by LSM trees. A dedicated visualization would show the internal structure (data blocks, index block, Bloom filter block, metadata) in detail rather than treating them as opaque boxes.

Config options:

  • blockSize (slider): Number of entries per data block
  • numEntries (slider): Total entries in the SSTable

Scenarios:

  • Build an SSTable from sorted entries: show block boundaries, index creation
  • Point lookup: binary search on index block, then scan within data block
  • Block compression: show before/after sizes
  • Bloom filter integration

2. Encoding

2.1 JSON Detail

Current state: Token-by-token JSON serialization with overhead analysis. Config: dataSet (3 presets).

New config options:

  • formatting (select): "Pretty-printed" vs "Minified" to show the overhead difference of whitespace and indentation
  • encoding (select): "UTF-8" vs "UTF-16" to show how character encoding affects byte counts
  • nestedDepth (slider, 1-3): Control nesting depth to show how overhead scales with structure complexity
  • includeNulls (toggle): Show how JSON handles null values and the overhead of explicit null representation

New scenarios:

  • Minified vs pretty-printed comparison: Show the same object in both formats side by side. Display byte savings from removing whitespace
  • Deeply nested objects: Show how nesting multiplies overhead (more braces, more indentation). Compare overhead percentage at depth 1 vs depth 3
  • Array-heavy data: Show serialization of arrays with many small elements (e.g., coordinates, time series) where structural overhead dominates
  • Schema evolution: Show what happens when a new field is added (receivers with old parsers ignore it) vs when a field is removed (old data still has it, wasting space)
  • Unicode handling: Show how multi-byte UTF-8 characters (e.g., emoji, CJK characters) affect byte counts and how JSON escapes non-ASCII with \uXXXX

Failure cases to add:

  • Parsing ambiguity: numbers that look like strings, trailing commas (invalid JSON)
  • Precision loss: large integers (> 2^53) that JavaScript can't represent faithfully
  • Injection: show how unescaped strings can break JSON structure

2.2 Protobuf Detail

Current state: Tag computation, varint encoding, byte-level encoding. Config: dataSet.

New config options:

  • showUnknownFields (toggle): Demonstrate what happens when the decoder encounters a field number not in its schema (unknown fields are preserved)
  • fieldNumbering (select): "Sequential (1,2,3)" vs "Sparse (1,5,100)" to show how field numbers affect tag byte size (varint encoding means field 100 needs 2-byte tag)
  • oneofEnabled (toggle): Show protobuf oneof fields and how only one is encoded

New scenarios:

  • Large field numbers: Show that field_num > 15 requires 2-byte tags due to varint encoding, affecting overhead
  • Repeated fields (packed): Show how repeated numeric fields are packed into a single length-delimited chunk instead of one tag per element
  • Default values omitted: Protobuf 3 doesn't encode fields set to default values (0, "", false). Show the space savings and the ambiguity this creates (can't distinguish "not set" from "set to default")
  • Schema evolution: Add a field, remove a field, rename a field. Show backward and forward compatibility through field number stability
  • Nested messages: Show how embedded messages are encoded as length-delimited bytes, and the overhead of length prefixes at each nesting level

Failure cases to add:

  • Field number reuse: reusing a deleted field number with a different type causes data corruption
  • Missing required fields (proto2): show what happens when a required field is absent
  • Wire type mismatch: show decoder error when wire type doesn't match expected field type

2.3 XML Detail

Current state: Token-by-token XML with tag duplication analysis. Config: dataSet.

New config options:

  • attributeMode (toggle): Show data as XML attributes (<user id="1" name="Alice"/>) instead of elements, and compare overhead
  • namespaceEnabled (toggle): Show XML namespace prefixes and how they add overhead (<ns:user xmlns:ns="...">)
  • cdataEnabled (toggle): Show CDATA sections for content that would otherwise need entity escaping

New scenarios:

  • Attributes vs elements: Same data encoded both ways, compare byte counts. Attributes eliminate closing tags but have their own quoting overhead
  • Mixed content: Show XML's unique ability to have text content interleaved with child elements (something JSON can't do naturally)
  • XML Schema validation: Show how an XSD defines types and constraints, and what invalid data looks like
  • Entity escaping: Show how special characters (&, <, >, ", ') require entity escaping (&, <, etc.) and the overhead this adds

Failure cases to add:

  • XML injection / billion laughs attack: show how entity expansion can be exploited
  • Encoding mismatch: declaring UTF-8 but including raw Latin-1 bytes
  • Malformed XML: mismatched tags, missing closing tags

2.4 Avro Detail

Current state: Schema-driven encoding, zigzag varint, schema evolution demo. Config: dataSet.

New config options:

  • schemaEvolutionMode (select): "Add Field" vs "Remove Field" vs "Rename Field (alias)" vs "Change Type (int->long)" to explore different evolution scenarios
  • unionType (toggle): Show Avro union encoding (e.g., ["null", "string"]) and how the union index byte adds overhead but enables optional fields
  • logicalTypes (toggle): Show logical types (date, timestamp, decimal) built on top of primitive types

New scenarios:

  • Backward compatibility: New reader schema has extra field with default. Old data can be read. Show exactly which bytes are read and which defaults are filled
  • Forward compatibility: Old reader schema doesn't have a field that new data has. Show how the old reader skips the unknown field (it can't, actually; Avro requires the writer's schema to decode)
  • Writer and reader schema resolution: Show the full resolution algorithm: match by field name, check type compatibility, apply defaults. This is Avro's killer feature
  • Schema Registry interaction: Show how schema IDs are prepended to Avro-encoded messages (Confluent wire format: magic byte + 4-byte schema ID + Avro bytes)
  • Container file format: Show the .avro file structure: header (magic, schema, sync marker) + data blocks (count + size + compressed data + sync)

Failure cases to add:

  • Incompatible schema change: changing a field type from string to int without a default
  • Missing writer schema: Avro data is unreadable without the writer's schema (unlike Protobuf which is self-describing at the wire level)
  • Union ambiguity: large unions make the index byte larger and complicate schema evolution

2.5 Comparison

Current state: Side-by-side 4-format comparison with progressive field encoding. Config: dataSet.

New scenarios:

  • Scaling analysis: Show how overhead percentage changes as the number of fields grows from 1 to 20. Plot a line chart showing bytes vs field count for all 4 formats
  • Repeated data: Encode an array of 10 identical objects. Show how each format handles repetition (JSON repeats all field names, Protobuf repeats tags, Avro repeats nothing)
  • Numeric-heavy data: Compare encoding of data with many small integers (sensor readings, counters) where varint encoding shines
  • Schema evolution comparison: Apply the same evolution (add a field) to all 4 formats and compare the impact

2.6 New Subtopic: MessagePack

Rationale: MessagePack is a binary JSON-like format that's widely used (Redis, Fluentd). It bridges JSON's schema-less flexibility with binary compactness.

Scenarios:

  • Show how MessagePack maps JSON types to compact binary representation
  • Compare byte counts with JSON for the same data
  • Show how it handles maps, arrays, and binary data
  • Demonstrate interoperability with JSON (lossless round-trip for most types)

2.7 New Subtopic: Thrift (TBinaryProtocol vs TCompactProtocol)

Rationale: Thrift was Facebook's alternative to Protobuf and shows interesting design differences (multiple serialization protocols, service definitions). TCompactProtocol uses delta encoding for field IDs which is an interesting optimization to visualize.


3. Replication

3.1 Single-Leader

Current state: 3 coordinated writes, stale read, failover. Config: numFollowers, replicationLag, operationType.

New config options:

  • replicationMode (select): "Synchronous" vs "Semi-Synchronous" vs "Asynchronous" to show the durability/latency tradeoff
  • followerFailure (toggle): Kill a follower during replication and show how the leader handles it (continues for async, blocks for sync)
  • readFromFollower (toggle): Route reads to followers and show consistency implications

New scenarios:

  • Synchronous replication: Leader waits for ALL followers to acknowledge before confirming to client. Show the latency cost (slowest follower determines response time) and the durability guarantee
  • Semi-synchronous: Leader waits for ONE follower (synchronous standby) before confirming. If the synchronous follower dies, promote another follower to synchronous. This is what PostgreSQL does by default
  • Replication lag measurement: Show a real-time lag counter per follower. Demonstrate how a slow follower falls behind and how reads from that follower return progressively staler data
  • Read-your-writes consistency: Client writes to leader, then reads from a follower. Without special handling, the read may not see the write. Show "read-your-writes" guarantee by routing the client's reads to the leader after a write
  • Monotonic reads: Client reads from follower A (sees value X), then reads from follower B (sees older value). Show how session stickiness to a single follower prevents this
  • Follower catch-up after restart: A follower goes offline, misses several writes, then comes back. Show the replay of the leader's log from the follower's last known position
  • Cascading replication: Followers replicate from other followers instead of the leader, reducing leader load. Show the added lag

Failure cases to add:

  • Split-brain during failover: old leader comes back after new leader is elected. Both think they're the leader. Show data divergence and the need for fencing tokens
  • Data loss during async failover: leader crashes with unreplicated writes. New leader doesn't have them. Client thinks writes succeeded
  • Replication lag causing inconsistent joins: application reads user from leader (exists) but reads user's orders from follower (not yet replicated). Gets empty result

3.2 Multi-Leader

Current state: Non-conflicting writes, conflicting writes, LWW/merge resolution. Config: numLeaders, resolutionStrategy.

New config options:

  • resolutionStrategy additions: Add "Custom Function" and "CRDT (LWW-Register)" to the existing "LWW" and "Merge"
  • networkPartition (toggle): Partition leaders into two groups that can't communicate. Show writes continuing independently and the conflict explosion when the partition heals
  • topologyMode (select): "All-to-All" vs "Star" vs "Ring" to show how replication topology affects conflict propagation speed and failure resilience

New scenarios:

  • Network partition and heal: Two leaders in separate partitions continue accepting writes. When the partition heals, show the avalanche of conflicts that need resolution. Contrast this with the steady-state conflict rate
  • Three-way conflict: Three leaders all write to the same key concurrently. Show how LWW picks one winner and discards two updates vs how merge must combine three values
  • Topology failure: In ring topology, one node failure breaks the ring. Show how messages can't propagate past the dead node. Compare with all-to-all where any single failure is tolerated
  • CRDT-based resolution: Show a G-Counter (grow-only counter) where each leader maintains its own counter and the merge is a max-per-leader operation. No conflicts possible by design
  • Conflict-free data types: Show how CRDTs for sets (add-wins set, remove-wins set) handle concurrent add/remove conflicts automatically

Failure cases to add:

  • Circular replication loops: A replicates to B, B to A, creating infinite replication. Show how origin tagging prevents this
  • Clock skew in LWW: Two leaders have drifted clocks. The "wrong" write wins because its timestamp is higher despite being causally earlier
  • Conflict rate explosion: Show how conflict probability grows quadratically with the number of leaders (N leaders = N*(N-1)/2 potential conflict pairs per key)

3.3 Leaderless

Current state: Write quorum, read quorum, read repair. Config: n, w, r.

New config options:

  • antiEntropyEnabled (toggle): Show background anti-entropy process (Merkle tree comparison) that catches inconsistencies read repair misses
  • hintedHandoff (toggle): When a target replica is down, write to a temporary sloppy quorum node. When the target recovers, hand off the write
  • conflictResolution (select): "Last-Write-Wins" vs "Vector Clocks" to show how concurrent writes to the same key are detected and resolved

New scenarios:

  • Sloppy quorum: Node goes down during a write. Show the write going to a non-preferred node (sloppy quorum). When the preferred node recovers, show the hinted handoff transferring the data back
  • Anti-entropy with Merkle trees: Two replicas compare their Merkle tree roots. If different, drill down to find the divergent subtree and synchronize only the changed keys. Show the logarithmic efficiency
  • Vector clock conflict detection: Two clients write to the same key on different replicas concurrently. Show how vector clocks detect that neither write causally follows the other (true conflict vs simple overwrite). Return both versions to the client for resolution
  • Quorum edge cases: Set W=1, R=1 (W+R=2 <= N=3). Show how a write to one replica and a read from a different replica misses the write entirely. No consistency guarantee
  • Write-write conflict: Two concurrent writes to the same key with different values. With LWW, show data loss. With vector clocks, show sibling creation

Failure cases to add:

  • Permanent node failure: Show how the cluster redistributes data from the failed node to remaining replicas (re-replication)
  • Quorum unavailability: When fewer than W nodes are alive, writes fail. Show the availability tradeoff
  • Read repair race condition: Two concurrent reads trigger two concurrent repairs to the same stale replica, potentially conflicting with each other
  • Stale data served during network partition (CAP theorem)

3.4 New Subtopic: Chain Replication

Rationale: Chain replication (used in HDFS, Azure Storage) is an important alternative to quorum-based replication. Writes flow through a chain (head -> middle -> tail), reads go to the tail only. Strong consistency with simpler logic.

Config: chainLength (3-5), scenario (normal, head failure, tail failure, middle failure)

Scenarios:

  • Normal operation: write enters at head, propagates through chain, acknowledged from tail
  • Head failure: next node becomes head
  • Tail failure: predecessor becomes tail
  • Middle failure: predecessor connects to successor, replays buffered writes

3.5 New Subtopic: Conflict-free Replicated Data Types (CRDTs)

Rationale: CRDTs deserve their own visualization given how important they are for multi-leader and leaderless systems. Show how data structures can be designed to never conflict.

Visualizations:

  • G-Counter: Grow-only counter with per-replica counters. Merge = max per replica
  • PN-Counter: Positive-negative counter using two G-Counters
  • LWW-Register: Last-write-wins register with timestamps
  • OR-Set (Observed-Remove Set): Add and remove operations with unique tags. Show how concurrent add+remove resolves to "add wins"

4. Partitioning

4.1 Hash-Based

Current state: Modular vs consistent hashing, add/remove partition, redistribution. Config: numPartitions, hashingMode.

New config options:

  • virtualNodesPerPartition (slider, 1-10): Control the number of virtual nodes in consistent hashing. Currently hardcoded to 3. Show how more virtual nodes improve distribution uniformity
  • hashFunction (select): "Simple char sum" vs "FNV-1a" vs "Murmur3" to show distribution quality differences
  • replicationFactor (slider, 1-3): Store each key on N consecutive partitions on the ring (consistent hashing). Show how this provides fault tolerance

New scenarios:

  • Virtual node impact: Compare distribution with 1, 3, and 10 virtual nodes per partition. Show standard deviation of keys per partition as a uniformity metric
  • Replication on the ring: In consistent hashing, replicate each key to the next N nodes clockwise. Show which nodes hold which replicas. When a node fails, show that replicas on other nodes cover the gap
  • Jump consistent hashing: Show the alternative algorithm that achieves perfectly uniform distribution without virtual nodes. Compare redistribution cost with ring-based consistent hashing
  • Node weight: Assign different weights to partitions (e.g., a powerful machine gets 2x the virtual nodes). Show the proportional key distribution

Failure cases to add:

  • Non-uniform distribution: With few virtual nodes, some partitions get 3x more keys than others. Show the variance
  • Cascading failure: A node dies, its keys move to the next node on the ring, overloading it. That node dies too. Show the domino effect
  • Hot key despite good hash function: Even with uniform distribution, a single popular key creates a hot spot on one partition

4.2 Key-Range

Current state: Uniform/skewed distribution, range queries. Config: numPartitions, keyDistribution.

New config options:

  • autoRebalance (toggle): When a partition exceeds a threshold, split it into two. Show dynamic range adjustment
  • splitThreshold (slider, 5-15): Number of keys that triggers a partition split
  • mergeEnabled (toggle): When a partition becomes too empty, merge with adjacent partition

New scenarios:

  • Dynamic splitting: Start with one partition. As keys are inserted, split partitions that become too full. Show how the ranges adjust automatically (this is how HBase/BigTable works)
  • Range query efficiency vs hash: Side-by-side comparison: range query [F-P] on key-range partitioning (scan 2 partitions) vs hash partitioning (must scan ALL partitions). Clear winner for range queries
  • Time-series data: Keys are timestamps. Show how all recent writes go to the last partition (hot partition) while older partitions are cold. Demonstrate the need for pre-splitting by time ranges
  • Composite keys: Show partitioning on (user_id, timestamp) where the first part determines the partition and the second part determines the sort order within the partition. Enables efficient range queries per user
  • Partition splits with data migration: Show the actual data movement when a range split occurs: keys in the upper half of the range move to the new partition

Failure cases to add:

  • Monotonic key insertion (auto-increment IDs): All inserts go to the last partition. Show extreme skew
  • Range query spanning all partitions: A very wide range query provides no partition pruning benefit
  • Split cascade: Inserting sorted data causes a chain of splits, all on the rightmost partition

4.3 Hot Partition

Current state: Uniform/skewed/celebrity scenarios, key splitting mitigation. Config: scenario, mitigationEnabled.

New config options:

  • loadBalancingStrategy (select): "None" vs "Key Splitting" vs "Random Suffix" vs "Request Routing" to show multiple mitigation strategies
  • monitoringEnabled (toggle): Show real-time metrics (p99 latency per partition, throughput per partition, queue depth) that would trigger alerts
  • requestPattern (select): Add "Time-based burst" (all requests in a short window) and "Gradual ramp" scenarios

New scenarios:

  • Random suffix strategy: Append a random suffix (0-9) to hot keys, distributing them across 10 partitions. Show the read-side scatter-gather cost (must read from all 10 and merge)
  • Request queuing: Show request queues per partition. When a partition is hot, its queue grows while others are idle. Visualize latency increase (queue depth * processing time)
  • Cascading hot partition: One hot partition overloads a node, causing it to slow down. Other partitions on the same node are affected (noisy neighbor). Show cross-partition interference
  • Auto-detection and mitigation: Monitor request rate per partition. When the rate exceeds a threshold, automatically apply key splitting. Show the detection delay and the transition period
  • Comparison table: Show all mitigation strategies side by side with their tradeoffs (read amplification, write amplification, complexity)

Failure cases to add:

  • Mitigation making reads harder: Key splitting for a counter requires reading all splits and summing. Show the read amplification cost
  • Flash crowd: Sudden burst of traffic to a new key (viral content). Mitigation strategies need warm-up time
  • False positive detection: A partition appears hot due to a monitoring spike but it's actually a batch job that will end soon. Unnecessary mitigation adds complexity

4.4 New Subtopic: Consistent Hashing Ring (Interactive)

Rationale: An interactive ring visualization where students can drag nodes around, add/remove nodes, and see keys redistribute in real-time. More hands-on than the current hash-based visualization.

Features:

  • Draggable partition nodes on the ring
  • Click to add keys at specific positions
  • Remove a node and watch keys slide to the next node
  • Show token ranges per node
  • Display replication factor coverage

4.5 New Subtopic: Partition Rebalancing Strategies

Rationale: When partitions become unbalanced, how do you rebalance? This is a critical operational concern.

Visualizations:

  • Fixed number of partitions: Create many more partitions than nodes (e.g., 1000 partitions for 10 nodes). When a node is added, transfer whole partitions. Show minimal data movement
  • Dynamic partitioning: Partitions split and merge based on size. Show the split/merge triggers and data movement
  • Proportional partitioning: Assign partitions proportional to node capacity (heterogeneous hardware). Show weighted distribution

5. Transactions

5.1 Dirty Read

Current state: T1 writes, T2 reads uncommitted, T1 aborts. Config: isolationLevel (none vs readCommitted).

New config options:

  • numTransactions (slider, 2-3): Add a third transaction to show cascading dirty reads (T1 dirty reads T2, T2 dirty reads T3)
  • operationType (select): "Abort after dirty read" vs "Commit after dirty read" to show that dirty reads are problematic even when the writer commits (because the reader made decisions based on an intermediate state)

New scenarios:

  • Cascading abort: T1 writes, T2 reads dirty value and writes based on it, T3 reads T2's dirty value. T1 aborts. Now T2 and T3 are both based on invalid data. Show the cascade
  • Dirty read with decision: T2 reads a dirty balance of $1000, decides to approve a loan. T1 aborts, actual balance is $50. Show the business impact of the dirty read
  • Read committed implementation: Show how the database maintains the "last committed value" separately from uncommitted changes. Two versions of the row exist simultaneously
  • Multiple dirty reads from same transaction: T1 makes several writes before aborting. T2 reads each one. Show all the phantom values T2 observed

Failure cases to add:

  • Dirty read leading to constraint violation: T2 reads dirty value, makes a write that violates a business rule that would have been caught with the committed value
  • Performance cost of preventing dirty reads: Show the lock/MVCC overhead of read committed vs no isolation

5.2 Non-Repeatable Read

Current state: T1 reads twice, T2 modifies between reads. Config: isolationLevel (readCommitted vs repeatableRead).

New config options:

  • numRows (slider, 1-3): Show non-repeatable reads affecting multiple rows simultaneously
  • mvccVisualization (toggle): Show the MVCC version chain that enables repeatable read (snapshot at transaction start)

New scenarios:

  • MVCC under the hood: Show the version chain: each update creates a new version with a transaction ID. Repeatable read gives T1 a snapshot that always sees versions from before T1 started
  • Banking scenario: T1 reads all account balances to compute a total. Between reads, T2 transfers money between accounts. T1 sees inconsistent totals even though no money was created or destroyed
  • Aggregate inconsistency: T1 computes COUNT(*) then SUM(balance). Between the two queries, T2 deletes a row. COUNT and SUM are inconsistent (SUM doesn't match COUNT * average)
  • Cursor stability: Show the weaker guarantee where only the "current row" under a cursor is locked, allowing changes to other rows

Failure cases to add:

  • Snapshot too old: Long-running T1 holds a snapshot while T2, T3, T4 make many changes. Show growing version chain and cleanup challenges
  • Write-read conflict: T1 reads a row, T2 updates it, T1 tries to update it. With snapshot isolation, T1's update may be based on stale data

5.3 Phantom Read

Current state: T1 runs predicate query twice, T2 inserts matching row between. Config: isolationLevel (repeatableRead vs serializable).

New config options:

  • predicateType (select): "Range (age > 25)" vs "Equality (dept = 'Eng')" vs "Complex (age > 25 AND dept = 'Eng')" to show different predicate scenarios
  • operationType (select): "Insert" vs "Update" vs "Delete" to show that phantoms can appear from updates moving rows into the predicate range, or disappear from deletes

New scenarios:

  • Update-based phantom: T1 queries WHERE age > 25. T2 updates Carol's age from 20 to 30. Carol now matches the predicate but didn't before. Second query returns one more row
  • Delete-based phantom: T1 queries WHERE age > 25, gets 3 rows. T2 deletes Bob (age 32). Second query returns 2 rows. Phantom disappearance
  • Predicate locking: Show how serializable isolation uses predicate locks (or gap locks) to prevent inserts/updates/deletes in the predicate range. Visualize the locked range on a number line
  • Index-range locking: Show how databases approximate predicate locks with index-range locks (lock the index entry for "age > 25" which covers a range of the index)
  • Phantom affecting aggregates: T1 computes COUNT(*) WHERE dept='Eng'. T2 inserts a new engineer. T1's recount shows a different number. Show impact on reporting queries

Failure cases to add:

  • Gap lock deadlocks: Two transactions trying to insert into the same gap. Show how gap locks can cause deadlocks
  • Serializable performance cost: Show the throughput difference between repeatable read and serializable due to predicate locking overhead

5.4 Lost Update

Current state: Two transactions read-modify-write a counter. Config: isolationLevel (none vs compareAndSet).

New config options:

  • concurrencyControl (select): "None" vs "Compare-and-Set" vs "SELECT FOR UPDATE (Pessimistic)" vs "Atomic Increment" to show multiple prevention strategies
  • numTransactions (slider, 2-4): Show lost update with 3-4 concurrent transactions to amplify the problem
  • operationType (select): "Counter increment" vs "Balance transfer" vs "Wiki page edit" for different real-world scenarios

New scenarios:

  • Pessimistic locking (SELECT FOR UPDATE): T1 acquires exclusive lock on the row. T2 blocks waiting for the lock. T1 completes, T2 proceeds with the updated value. No lost update, but reduced concurrency
  • Atomic increment: Database provides UPDATE counter SET value = value + 1. Single operation, no read-modify-write cycle. Show the simplicity
  • Multi-row lost update: Transfer $100 from account A to B. Two concurrent transfers can result in money appearing or disappearing if both read the same starting balances
  • Wiki page conflict: Two users edit the same page. Last save wins and overwrites the first user's changes. Show "optimistic locking" with version numbers
  • 3-transaction pile-up: Three transactions all increment a counter from 10. Expected result: 13. Without protection: 11. Show the compounding loss

Failure cases to add:

  • CAS retry storm: Under high contention, CAS fails repeatedly. Show exponential backoff and the throughput cliff
  • Pessimistic lock deadlock: T1 locks row A, T2 locks row B, T1 tries to lock B (blocked), T2 tries to lock A (deadlock). Show deadlock detection and resolution

5.5 Write Skew

Current state: Doctor on-call scenario with snapshot isolation vs serializable. Config: isolationLevel.

New config options:

  • scenario (select): "Doctor on-call" vs "Meeting room booking" vs "Username uniqueness" to show different write skew patterns
  • numActors (slider, 2-3): Two or three actors trying to violate the invariant simultaneously
  • detectionMethod (select): "Serializable" vs "Materializing conflicts" to show the workaround

New scenarios:

  • Meeting room booking: Two users both check that room X is free at 2pm. Both insert a booking. Double booking results. Invariant: at most one booking per room per timeslot
  • Username uniqueness: Two users both check that username "coolname" is available. Both create accounts with that username. Duplicate results. Show how a unique index prevents this vs how it fails under snapshot isolation in some databases
  • Three-doctor write skew: Three doctors on call, invariant is "at least 1 must remain." All three check (see 3 on call), all three leave. Nobody on call
  • Materializing conflicts: Instead of checking a predicate, insert a "lock row" that represents the constraint. Now the write-write conflict is detected by the database. Show the workaround pattern
  • Serializable Snapshot Isolation (SSI): Show how SSI detects write skew by tracking read dependencies and detecting dangerous structures (rw-dependency cycles)

Failure cases to add:

  • Write skew across multiple tables: T1 reads table A, writes table B. T2 reads table B, writes table A. Cross-table dependency makes detection harder
  • SSI false positive: SSI aborts a transaction that wasn't actually going to cause an anomaly (conservative detection)

5.6 Isolation Comparison

Current state: Static table showing anomalies vs isolation levels.

Improvements:

  • Make it interactive: click on a cell ("Phantom Read" x "Repeatable Read") to jump to that specific visualization with the right config preset
  • Add a "Performance Impact" row showing relative throughput at each isolation level
  • Add "Implementation" row: "Locks", "MVCC", "MVCC + SSI"
  • Add "Default in" row: "PostgreSQL: Read Committed", "MySQL InnoDB: Repeatable Read"
  • Add hover tooltips explaining each anomaly in one sentence

5.7 New Subtopic: Deadlock Detection

Rationale: Deadlocks are a critical failure mode in pessimistic concurrency control. Visualizing the wait-for graph makes the concept intuitive.

Config: numTransactions (2-4), detectionStrategy (timeout vs wait-for graph)

Scenarios:

  • Simple deadlock: T1 holds A, wants B. T2 holds B, wants A. Show the cycle in the wait-for graph
  • Complex deadlock: T1->T2->T3->T1 cycle
  • Deadlock resolution: Abort the youngest transaction (least work lost). Show the victim selection criteria
  • Timeout-based detection: Show how a timeout (less sophisticated) can also resolve deadlocks but may be too aggressive or too slow

5.8 New Subtopic: MVCC (Multi-Version Concurrency Control)

Rationale: MVCC is the foundation of snapshot isolation and repeatable read. A dedicated visualization showing the version chain, garbage collection, and snapshot visibility rules would be extremely valuable.

Scenarios:

  • Version chain: Each update creates a new version. Show the linked list of versions per row
  • Snapshot visibility: Transaction with snapshot at time T can only see versions committed before T
  • Garbage collection: Old versions that no active transaction can see are cleaned up
  • Long-running transactions preventing GC: A long-running read transaction pins old versions, causing version chain bloat

6. Consensus

6.1 Raft

Current state: Election, replication, node failure, split vote. Config: clusterSize, scenario.

New config options:

  • heartbeatInterval (slider): Show the relationship between heartbeat frequency and failure detection speed
  • networkPartition (toggle): Partition the cluster into majority and minority groups
  • logConflict (toggle): New leader finds a follower with conflicting log entries from a previous term

New scenarios:

  • Network partition: Leader is partitioned from the majority. Majority elects a new leader and continues. Old leader tries to replicate but gets rejected (stale term). When partition heals, old leader discovers the new term, steps down, and catches up. Show the two parallel timelines
  • Log conflict resolution: Follower has entries from a previous leader that were never committed. New leader's log diverges. Show how the new leader overwrites the follower's conflicting entries (AppendEntries consistency check with prevLogIndex/prevLogTerm)
  • Cluster membership change (joint consensus): Add a new node to the cluster. Show the two-phase joint consensus approach: first enter joint configuration (old + new), then transition to new configuration. Prevent split-brain during the transition
  • Leader transfer: Current leader voluntarily transfers leadership to another node (e.g., before maintenance). Show the TimeoutNow RPC
  • Read linearizability: Show how a read-only request requires the leader to confirm it's still the leader (via a round of heartbeats) before responding. Without this, a partitioned old leader could serve stale reads

Failure cases to add:

  • Pre-vote: Node isolated from the cluster keeps incrementing its term. When it rejoins, its high term disrupts the cluster. Show how the pre-vote mechanism (Raft extension) prevents this
  • Commit of entries from previous terms: A leader can only commit entries from its own term by committing a new entry (the "Figure 8" problem from the Raft paper)
  • Follower that's very far behind: Show the catch-up mechanism (snapshot installation) when the log gap is too large

6.2 Paxos

Current state: Single proposer (happy path), competing proposers. Config: numProposers, numAcceptors, scenario.

New config options:

  • numRounds (slider, 1-3): Show multiple Paxos rounds (Multi-Paxos) to demonstrate how a stable leader skips Phase 1
  • acceptorFailure (toggle): Kill an acceptor during the protocol and show how the majority quorum still reaches consensus
  • scenarioType additions: "Dueling Proposers (Livelock)" and "Multi-Paxos Optimization"

New scenarios:

  • Dueling proposers livelock: P1 sends Prepare(1), P2 sends Prepare(2), P1 gets Nack(2), P1 sends Prepare(3), P2 gets Nack(3), P2 sends Prepare(4)... Neither can complete Phase 2 because the other keeps preempting. Show the infinite loop and explain why a stable leader (Multi-Paxos) is needed
  • Multi-Paxos optimization: A stable leader skips Phase 1 (Prepare/Promise) for subsequent consensus rounds after the first. Show the throughput improvement: 2 message delays instead of 4
  • Acceptor failure mid-protocol: An acceptor dies after sending Promise but before receiving Accept. Show that the remaining majority can still reach consensus
  • Value already accepted: In Phase 1b, an acceptor returns a previously accepted value. The proposer MUST propose that value instead of its own. Show this critical safety rule
  • Paxos Made Simple walkthrough: Step through the "classic" Paxos example from Lamport's paper with concrete values

Failure cases to add:

  • Minority failure: Up to F failures tolerated with 2F+1 acceptors. Show the exact threshold
  • Learner inconsistency: How do learners (separate from acceptors) learn the decided value? Show the message pattern
  • Distinguished proposer failure: The optimization of having a single proposer breaks if that proposer fails. Show recovery

6.3 Total Order Broadcast

Current state: Total order (uniform delivery) vs causal order (happens-before). Config: numNodes, broadcastMode.

New config options:

  • messageCount (slider, 3-6): More messages to show ordering complexity as message count increases
  • failureMode (select): "None" vs "Node crash" vs "Network delay" to show reliability guarantees
  • implementationMode (select): "Sequencer-based" vs "Consensus-based (Paxos)" to show different TOB implementations

New scenarios:

  • Sequencer-based TOB: A dedicated sequencer node assigns sequence numbers. Show the single point of failure and what happens when the sequencer crashes (need consensus to elect a new sequencer)
  • Consensus-based TOB: Each message delivery is decided by a Paxos/Raft round. Show how TOB and consensus are equivalent (each can implement the other)
  • State machine replication: Three replicas of a key-value store. Use TOB to deliver writes in the same order. Show that all replicas end up with identical state regardless of which node received the original write
  • FIFO + Causal + Total ordering comparison: Show three ordering guarantees side by side with the same set of messages. FIFO only preserves per-sender order. Causal preserves happens-before. Total ensures identical ordering everywhere
  • Network delay causing reordering: Two messages sent nearly simultaneously. Network delays cause them to arrive in different orders at different nodes. Show how TOB reorders them despite physical delivery order

Failure cases to add:

  • Node crash during delivery: A node crashes after partially delivering a message. Other nodes must detect this and ensure consistency
  • Split-brain in sequencer: Network partition isolating the sequencer from some nodes. Show how nodes on the wrong side of the partition stop receiving ordered messages

6.4 Linearizability

Current state: Linearizable history, non-linearizable history, sequential consistency. Config: scenario.

New config options:

  • numClients (slider, 2-4): More clients create more complex operation overlaps
  • numOperations (slider, 3-6): More operations per scenario
  • registerType (select): "Single register" vs "Two registers" to show cross-register linearizability challenges

New scenarios:

  • Interactive linearization: Let the user drag linearization points along the time axis to find (or fail to find) a valid linearization. Immediate feedback on whether the chosen ordering is consistent
  • CAS operation: Add compare-and-swap operations to the register. Show how CAS creates stronger ordering constraints than simple reads/writes
  • Two-register scenario: Operations on registers X and Y. Show how per-register linearizability doesn't imply cross-register linearizability
  • Real-time ordering vs process ordering: Show a history that satisfies sequential consistency but not linearizability because it violates real-time ordering. Make the distinction crystal clear
  • Stale read detection: A practical scenario: client writes to the leader, reads from a follower, gets old value. Is this linearizable? Show why not

Failure cases to add:

  • NP-completeness: Mention that checking linearizability is NP-complete in general, and show a complex example where it's not obvious whether the history is linearizable
  • Cost of linearizability: Show the latency overhead of linearizable reads (require leader confirmation) vs non-linearizable reads (can read from any replica)

6.5 New Subtopic: Two-Phase Commit (2PC)

Rationale: 2PC is the classic atomic commit protocol for distributed transactions. It's critical for understanding how distributed databases ensure all-or-nothing semantics across partitions.

Config: numParticipants (2-5), failureMode (none, coordinator crash, participant crash, network partition)

Scenarios:

  • Happy path: Coordinator sends Prepare, all participants vote Yes, coordinator sends Commit
  • Participant votes No: Coordinator sends Abort to all
  • Coordinator crash after Prepare, before decision: Participants are stuck (the "blocking" problem of 2PC). Show the uncertainty window
  • Participant crash after voting Yes: Must recover and still commit (promised to commit)
  • Three-phase commit (3PC): Show how adding a pre-commit phase eliminates the blocking problem (at the cost of more messages)

6.6 New Subtopic: Lamport Clocks / Vector Clocks

Rationale: Logical clocks are fundamental to understanding distributed ordering. They underpin many of the other visualizations (causal order, conflict detection, CRDT merging).

Config: numNodes (2-4), clockType (Lamport vs Vector), numEvents (5-10)

Scenarios:

  • Lamport clock: Show the "max(local, received) + 1" rule. Demonstrate that Lamport clocks capture happens-before but can't detect concurrency
  • Vector clocks: Show the per-node counter vector. Demonstrate concurrent event detection (neither vector dominates the other)
  • Causal ordering: Show how vector clocks enable exact causality tracking
  • Clock comparison: Same events with Lamport vs Vector clocks side by side, showing Vector clocks' additional power

6.7 New Subtopic: Byzantine Fault Tolerance (PBFT)

Rationale: Byzantine faults (nodes behaving arbitrarily, including maliciously) are important for blockchain and high-assurance systems. Show how PBFT requires 3f+1 nodes to tolerate f failures.

Config: numNodes (4-7), numFaulty (0-2), scenario (normal, Byzantine leader, Byzantine follower)

Scenarios:

  • Normal operation: Pre-prepare -> Prepare -> Commit phases
  • Byzantine leader sends conflicting messages to different nodes
  • Byzantine follower sends wrong responses. Honest nodes detect inconsistency via quorum

7. Cross-Cutting Improvements

7.1 Shared UX Enhancements

  • Annotation panel expansion: Add a collapsible "Learn More" panel below each annotation with deeper explanation, relevant paper references, and database-specific implementation notes
  • Comparison mode: Split the canvas vertically and run two configurations side by side (e.g., B-Tree order=3 vs order=5, or Sync vs Async replication)
  • Scenario presets: Add a "Presets" dropdown per visualization with named scenarios that set all config values to demonstrate specific concepts (e.g., "CAP theorem demo" for leaderless replication with N=3, W=1, R=1)
  • Statistics dashboard: Show running counters for each visualization: total messages sent, operations completed, conflicts detected, latency (frames), space used, etc.
  • Frame-level diff highlighting: When stepping between frames, highlight exactly what changed (new entries, state transitions, value changes) with a brief animation or color flash

7.2 Testing Coverage

  • Add engine tests for every visualization (currently only useAnimationController has tests)
  • Property-based tests: For engines with configurable parameters, test that all configs produce valid frame sequences (no undefined states, annotations always present, frame counts > 0)
  • Snapshot tests for key frames to catch unintended rendering changes

7.3 Accessibility

  • Color-blind friendly palette option
  • Keyboard-only navigation through interactive elements
  • Screen reader annotations for canvas content (off-screen text descriptions synchronized with frames)