IMPROVEMENTS.md

Detailed improvement plan for each visualization topic. Covers new config options, additional scenarios, failure cases, and new subtopics to add.

1. Indexing

1.1 Bitcask

Current state: Writes, reads, file rotation, compaction. Config: maxFileSize, initialKeys.

New config options:

compactionStrategy (select): "merge-all" vs "merge-oldest-two" to show different compaction approaches
tombstoneEnabled (toggle): Show delete operations as tombstone markers instead of just overwrites. Demonstrates how deletes work in append-only systems
crashRecovery (toggle): Enable a crash scenario mid-write, then show recovery by replaying the data files to rebuild the keydir

New scenarios:

Delete with tombstones: Write a key, then delete it. Show the tombstone entry appended to the active file and keydir removal. During compaction, show how the tombstone and the original entry are both discarded
Crash recovery: Simulate a crash after appending to the data file but before updating the keydir. On recovery, scan all data files sequentially to rebuild the in-memory keydir. Highlight how this is O(n) in the number of entries, which is Bitcask's main limitation
Hint files: After compaction, generate a "hint file" alongside the data file that contains only (key, offset, file_id) tuples, allowing faster restart without scanning all data

Failure cases to add:

Crash mid-write (partial entry in data file): show CRC validation detecting corruption
Disk full during compaction: old files must be retained until compaction completes successfully
Large keydir exceeding RAM: show what happens when the key count grows beyond available memory (the fundamental Bitcask limitation)

1.2 Hash Map

Current state: Insert 10 keys, collision chaining, resize/rehash, lookups. Config: numBuckets, loadFactorThreshold.

New config options:

collisionStrategy (select): "Separate Chaining" vs "Open Addressing (Linear Probing)" vs "Open Addressing (Robin Hood)". Currently only chaining is shown
hashFunction (select): "Simple Modular" vs "Murmur-like" to demonstrate how hash function quality affects distribution
deleteEnabled (toggle): Show deletion with chaining (remove from linked list) vs open addressing (tombstone marker)

New scenarios:

Linear probing: Show probing sequence when slots are occupied. Demonstrate clustering effect where occupied slots tend to cluster together, degrading performance
Robin Hood hashing: Show steal from rich (short probe distance) to give to poor (long probe distance). Display probe distances per entry to illustrate the variance reduction
Worst-case lookup: Craft a scenario where many keys hash to the same bucket, creating a long chain. Show O(n) lookup degradation
Delete + lookup interaction: In open addressing, show why you can't just empty a slot (it would break probe chains) and need tombstones instead

Failure cases to add:

Pathological hash function: All keys map to the same bucket, turning O(1) into O(n)
Resize under memory pressure: Show the temporary 2x memory spike during rehashing
Concurrent access hazard: Two threads inserting simultaneously cause a lost entry (conceptual, not animated in real-time, but show the interleaving)

1.3 LSM Tree

Current state: Writes to memtable, flush to L0, cascading compaction, reads across levels. Config: memtableSize, levelSizeMultiplier, numLevels.

New config options:

compactionStrategy (select): "Size-Tiered" vs "Leveled" to show the two major compaction strategies. Currently only leveled is shown
bloomFilterEnabled (toggle): Show Bloom filter checks before SSTable reads to skip levels that definitely don't contain a key
writeAheadLog (toggle): Show WAL writes before memtable writes for crash recovery

New scenarios:

Size-tiered compaction: Multiple SSTables per level, compaction triggers when count exceeds threshold. Contrast with leveled where each level has one sorted run
Bloom filter optimization: During reads, show the Bloom filter check per SSTable. Display false positive rate and how it reduces disk reads. Show a false positive case where the Bloom filter says "maybe" but the key isn't actually there
Range query: Instead of point lookups, show a range scan that must merge results across all levels using a merge iterator. Highlight the complexity of maintaining sorted order across levels
WAL recovery: Write to WAL, then memtable. Crash and lose memtable. Replay WAL to recover memtable contents
Space amplification: Show how multiple versions of the same key exist across levels until compaction cleans them up. Display the space amplification ratio

Failure cases to add:

Write stall: Memtable fills faster than compaction can drain L0, causing back-pressure
Read amplification measurement: Count the number of SSTable reads per lookup across different tree depths
Crash before flush: Memtable data lost without WAL

1.4 B-Tree

Current state: Insertions (prime numbers), node splits, search. Config: order, initialKeys.

New config options:

deleteEnabled (toggle): Show B-tree deletion with merge and redistribution
customKeys (text input or preset select): "Sequential", "Random", "Reverse" to show how insertion order affects tree shape during construction
treeVariant (select): "B-Tree" vs "B+ Tree" to show the difference (B+ tree stores values only in leaves, internal nodes are pure routing)

New scenarios:

Deletion: Remove a key from a leaf. If the node underflows (fewer than ceil(order/2)-1 keys), show redistribution from a sibling or merge with a sibling. Show cascading merges up to the root
B+ Tree variant: Internal nodes contain only keys for routing. Leaf nodes form a linked list for range scans. Show a range query traversing the leaf chain without going back to internal nodes
Bulk loading: Instead of inserting one key at a time, show the bottom-up bulk loading algorithm that sorts keys first and builds leaves, then constructs internal nodes. Much more efficient than repeated single inserts
Sequential vs random insert: Show how sequential inserts create a right-leaning tree with many splits on the rightmost path, while random inserts distribute splits more evenly
Disk page mapping: Show how each node corresponds to a disk page (typically 4KB). Visualize how higher order means fewer levels and fewer disk seeks

Failure cases to add:

Node underflow during deletion triggering merge cascade
Root demotion: When the root has only one child after a merge, remove the root level (tree shrinks)
Write-ahead logging for B-tree pages: Show how a page split requires WAL to avoid corruption on crash

1.5 New Subtopic: Skip List

Rationale: Skip lists are used as the memtable data structure in many LSM-tree implementations (LevelDB, RocksDB). They provide O(log n) search, insert, and delete with simpler implementation than balanced trees.

Config options:

maxLevel (slider, 2-6): Maximum number of levels
promotionProbability (slider, 0.25-0.75): Probability of promoting a node to the next level
numKeys (slider, 5-15): Number of keys to insert

Scenarios:

Insert keys and show probabilistic level promotion
Search showing how higher levels skip large portions of the list
Delete and pointer updates
Compare with linked list search (O(n)) to highlight the speedup

1.6 New Subtopic: SSTable / Sorted String Table

Rationale: SSTables are the on-disk format used by LSM trees. A dedicated visualization would show the internal structure (data blocks, index block, Bloom filter block, metadata) in detail rather than treating them as opaque boxes.

Config options:

blockSize (slider): Number of entries per data block
numEntries (slider): Total entries in the SSTable

Scenarios:

Build an SSTable from sorted entries: show block boundaries, index creation
Point lookup: binary search on index block, then scan within data block
Block compression: show before/after sizes
Bloom filter integration

2. Encoding

2.1 JSON Detail

Current state: Token-by-token JSON serialization with overhead analysis. Config: dataSet (3 presets).

New config options:

formatting (select): "Pretty-printed" vs "Minified" to show the overhead difference of whitespace and indentation
encoding (select): "UTF-8" vs "UTF-16" to show how character encoding affects byte counts
nestedDepth (slider, 1-3): Control nesting depth to show how overhead scales with structure complexity
includeNulls (toggle): Show how JSON handles null values and the overhead of explicit null representation

New scenarios:

Minified vs pretty-printed comparison: Show the same object in both formats side by side. Display byte savings from removing whitespace
Deeply nested objects: Show how nesting multiplies overhead (more braces, more indentation). Compare overhead percentage at depth 1 vs depth 3
Array-heavy data: Show serialization of arrays with many small elements (e.g., coordinates, time series) where structural overhead dominates
Schema evolution: Show what happens when a new field is added (receivers with old parsers ignore it) vs when a field is removed (old data still has it, wasting space)
Unicode handling: Show how multi-byte UTF-8 characters (e.g., emoji, CJK characters) affect byte counts and how JSON escapes non-ASCII with \uXXXX

Failure cases to add:

Parsing ambiguity: numbers that look like strings, trailing commas (invalid JSON)
Precision loss: large integers (> 2^53) that JavaScript can't represent faithfully
Injection: show how unescaped strings can break JSON structure

2.2 Protobuf Detail

Current state: Tag computation, varint encoding, byte-level encoding. Config: dataSet.

New config options:

showUnknownFields (toggle): Demonstrate what happens when the decoder encounters a field number not in its schema (unknown fields are preserved)
fieldNumbering (select): "Sequential (1,2,3)" vs "Sparse (1,5,100)" to show how field numbers affect tag byte size (varint encoding means field 100 needs 2-byte tag)
oneofEnabled (toggle): Show protobuf oneof fields and how only one is encoded

New scenarios:

Large field numbers: Show that field_num > 15 requires 2-byte tags due to varint encoding, affecting overhead
Repeated fields (packed): Show how repeated numeric fields are packed into a single length-delimited chunk instead of one tag per element
Default values omitted: Protobuf 3 doesn't encode fields set to default values (0, "", false). Show the space savings and the ambiguity this creates (can't distinguish "not set" from "set to default")
Schema evolution: Add a field, remove a field, rename a field. Show backward and forward compatibility through field number stability
Nested messages: Show how embedded messages are encoded as length-delimited bytes, and the overhead of length prefixes at each nesting level

Failure cases to add:

Field number reuse: reusing a deleted field number with a different type causes data corruption
Missing required fields (proto2): show what happens when a required field is absent
Wire type mismatch: show decoder error when wire type doesn't match expected field type

2.3 XML Detail

Current state: Token-by-token XML with tag duplication analysis. Config: dataSet.

New config options:

attributeMode (toggle): Show data as XML attributes (<user id="1" name="Alice"/>) instead of elements, and compare overhead
namespaceEnabled (toggle): Show XML namespace prefixes and how they add overhead (<ns:user xmlns:ns="...">)
cdataEnabled (toggle): Show CDATA sections for content that would otherwise need entity escaping

New scenarios:

Attributes vs elements: Same data encoded both ways, compare byte counts. Attributes eliminate closing tags but have their own quoting overhead
Mixed content: Show XML's unique ability to have text content interleaved with child elements (something JSON can't do naturally)
XML Schema validation: Show how an XSD defines types and constraints, and what invalid data looks like
Entity escaping: Show how special characters (&, <, >, ", ') require entity escaping (&, <, etc.) and the overhead this adds

Failure cases to add:

XML injection / billion laughs attack: show how entity expansion can be exploited
Encoding mismatch: declaring UTF-8 but including raw Latin-1 bytes
Malformed XML: mismatched tags, missing closing tags

2.4 Avro Detail

Current state: Schema-driven encoding, zigzag varint, schema evolution demo. Config: dataSet.

New config options:

schemaEvolutionMode (select): "Add Field" vs "Remove Field" vs "Rename Field (alias)" vs "Change Type (int->long)" to explore different evolution scenarios
unionType (toggle): Show Avro union encoding (e.g., ["null", "string"]) and how the union index byte adds overhead but enables optional fields
logicalTypes (toggle): Show logical types (date, timestamp, decimal) built on top of primitive types

New scenarios:

Backward compatibility: New reader schema has extra field with default. Old data can be read. Show exactly which bytes are read and which defaults are filled
Forward compatibility: Old reader schema doesn't have a field that new data has. Show how the old reader skips the unknown field (it can't, actually; Avro requires the writer's schema to decode)
Writer and reader schema resolution: Show the full resolution algorithm: match by field name, check type compatibility, apply defaults. This is Avro's killer feature
Schema Registry interaction: Show how schema IDs are prepended to Avro-encoded messages (Confluent wire format: magic byte + 4-byte schema ID + Avro bytes)
Container file format: Show the .avro file structure: header (magic, schema, sync marker) + data blocks (count + size + compressed data + sync)

Failure cases to add:

Incompatible schema change: changing a field type from string to int without a default
Missing writer schema: Avro data is unreadable without the writer's schema (unlike Protobuf which is self-describing at the wire level)
Union ambiguity: large unions make the index byte larger and complicate schema evolution

2.5 Comparison

Current state: Side-by-side 4-format comparison with progressive field encoding. Config: dataSet.

New scenarios:

Scaling analysis: Show how overhead percentage changes as the number of fields grows from 1 to 20. Plot a line chart showing bytes vs field count for all 4 formats
Repeated data: Encode an array of 10 identical objects. Show how each format handles repetition (JSON repeats all field names, Protobuf repeats tags, Avro repeats nothing)
Numeric-heavy data: Compare encoding of data with many small integers (sensor readings, counters) where varint encoding shines
Schema evolution comparison: Apply the same evolution (add a field) to all 4 formats and compare the impact

2.6 New Subtopic: MessagePack

Rationale: MessagePack is a binary JSON-like format that's widely used (Redis, Fluentd). It bridges JSON's schema-less flexibility with binary compactness.

Scenarios:

Show how MessagePack maps JSON types to compact binary representation
Compare byte counts with JSON for the same data
Show how it handles maps, arrays, and binary data
Demonstrate interoperability with JSON (lossless round-trip for most types)

2.7 New Subtopic: Thrift (TBinaryProtocol vs TCompactProtocol)

Rationale: Thrift was Facebook's alternative to Protobuf and shows interesting design differences (multiple serialization protocols, service definitions). TCompactProtocol uses delta encoding for field IDs which is an interesting optimization to visualize.

3. Replication

3.1 Single-Leader

Current state: 3 coordinated writes, stale read, failover. Config: numFollowers, replicationLag, operationType.

New config options:

replicationMode (select): "Synchronous" vs "Semi-Synchronous" vs "Asynchronous" to show the durability/latency tradeoff
followerFailure (toggle): Kill a follower during replication and show how the leader handles it (continues for async, blocks for sync)
readFromFollower (toggle): Route reads to followers and show consistency implications

New scenarios:

Synchronous replication: Leader waits for ALL followers to acknowledge before confirming to client. Show the latency cost (slowest follower determines response time) and the durability guarantee
Semi-synchronous: Leader waits for ONE follower (synchronous standby) before confirming. If the synchronous follower dies, promote another follower to synchronous. This is what PostgreSQL does by default
Replication lag measurement: Show a real-time lag counter per follower. Demonstrate how a slow follower falls behind and how reads from that follower return progressively staler data
Read-your-writes consistency: Client writes to leader, then reads from a follower. Without special handling, the read may not see the write. Show "read-your-writes" guarantee by routing the client's reads to the leader after a write
Monotonic reads: Client reads from follower A (sees value X), then reads from follower B (sees older value). Show how session stickiness to a single follower prevents this
Follower catch-up after restart: A follower goes offline, misses several writes, then comes back. Show the replay of the leader's log from the follower's last known position
Cascading replication: Followers replicate from other followers instead of the leader, reducing leader load. Show the added lag

Failure cases to add:

Split-brain during failover: old leader comes back after new leader is elected. Both think they're the leader. Show data divergence and the need for fencing tokens
Data loss during async failover: leader crashes with unreplicated writes. New leader doesn't have them. Client thinks writes succeeded
Replication lag causing inconsistent joins: application reads user from leader (exists) but reads user's orders from follower (not yet replicated). Gets empty result

3.2 Multi-Leader

Current state: Non-conflicting writes, conflicting writes, LWW/merge resolution. Config: numLeaders, resolutionStrategy.

New config options:

resolutionStrategy additions: Add "Custom Function" and "CRDT (LWW-Register)" to the existing "LWW" and "Merge"
networkPartition (toggle): Partition leaders into two groups that can't communicate. Show writes continuing independently and the conflict explosion when the partition heals
topologyMode (select): "All-to-All" vs "Star" vs "Ring" to show how replication topology affects conflict propagation speed and failure resilience

New scenarios:

Network partition and heal: Two leaders in separate partitions continue accepting writes. When the partition heals, show the avalanche of conflicts that need resolution. Contrast this with the steady-state conflict rate
Three-way conflict: Three leaders all write to the same key concurrently. Show how LWW picks one winner and discards two updates vs how merge must combine three values
Topology failure: In ring topology, one node failure breaks the ring. Show how messages can't propagate past the dead node. Compare with all-to-all where any single failure is tolerated
CRDT-based resolution: Show a G-Counter (grow-only counter) where each leader maintains its own counter and the merge is a max-per-leader operation. No conflicts possible by design
Conflict-free data types: Show how CRDTs for sets (add-wins set, remove-wins set) handle concurrent add/remove conflicts automatically

Failure cases to add:

Circular replication loops: A replicates to B, B to A, creating infinite replication. Show how origin tagging prevents this
Clock skew in LWW: Two leaders have drifted clocks. The "wrong" write wins because its timestamp is higher despite being causally earlier
Conflict rate explosion: Show how conflict probability grows quadratically with the number of leaders (N leaders = N*(N-1)/2 potential conflict pairs per key)

3.3 Leaderless

Current state: Write quorum, read quorum, read repair. Config: n, w, r.

New config options:

antiEntropyEnabled (toggle): Show background anti-entropy process (Merkle tree comparison) that catches inconsistencies read repair misses
hintedHandoff (toggle): When a target replica is down, write to a temporary sloppy quorum node. When the target recovers, hand off the write
conflictResolution (select): "Last-Write-Wins" vs "Vector Clocks" to show how concurrent writes to the same key are detected and resolved

New scenarios:

Sloppy quorum: Node goes down during a write. Show the write going to a non-preferred node (sloppy quorum). When the preferred node recovers, show the hinted handoff transferring the data back
Anti-entropy with Merkle trees: Two replicas compare their Merkle tree roots. If different, drill down to find the divergent subtree and synchronize only the changed keys. Show the logarithmic efficiency
Vector clock conflict detection: Two clients write to the same key on different replicas concurrently. Show how vector clocks detect that neither write causally follows the other (true conflict vs simple overwrite). Return both versions to the client for resolution
Quorum edge cases: Set W=1, R=1 (W+R=2 <= N=3). Show how a write to one replica and a read from a different replica misses the write entirely. No consistency guarantee
Write-write conflict: Two concurrent writes to the same key with different values. With LWW, show data loss. With vector clocks, show sibling creation

Failure cases to add:

Permanent node failure: Show how the cluster redistributes data from the failed node to remaining replicas (re-replication)
Quorum unavailability: When fewer than W nodes are alive, writes fail. Show the availability tradeoff
Read repair race condition: Two concurrent reads trigger two concurrent repairs to the same stale replica, potentially conflicting with each other
Stale data served during network partition (CAP theorem)

3.4 New Subtopic: Chain Replication

Rationale: Chain replication (used in HDFS, Azure Storage) is an important alternative to quorum-based replication. Writes flow through a chain (head -> middle -> tail), reads go to the tail only. Strong consistency with simpler logic.

Config: chainLength (3-5), scenario (normal, head failure, tail failure, middle failure)

Scenarios:

Normal operation: write enters at head, propagates through chain, acknowledged from tail
Head failure: next node becomes head
Tail failure: predecessor becomes tail
Middle failure: predecessor connects to successor, replays buffered writes

3.5 New Subtopic: Conflict-free Replicated Data Types (CRDTs)

Rationale: CRDTs deserve their own visualization given how important they are for multi-leader and leaderless systems. Show how data structures can be designed to never conflict.

Visualizations:

G-Counter: Grow-only counter with per-replica counters. Merge = max per replica
PN-Counter: Positive-negative counter using two G-Counters
LWW-Register: Last-write-wins register with timestamps
OR-Set (Observed-Remove Set): Add and remove operations with unique tags. Show how concurrent add+remove resolves to "add wins"

4. Partitioning

4.1 Hash-Based

Current state: Modular vs consistent hashing, add/remove partition, redistribution. Config: numPartitions, hashingMode.

New config options:

virtualNodesPerPartition (slider, 1-10): Control the number of virtual nodes in consistent hashing. Currently hardcoded to 3. Show how more virtual nodes improve distribution uniformity
hashFunction (select): "Simple char sum" vs "FNV-1a" vs "Murmur3" to show distribution quality differences
replicationFactor (slider, 1-3): Store each key on N consecutive partitions on the ring (consistent hashing). Show how this provides fault tolerance

New scenarios:

Virtual node impact: Compare distribution with 1, 3, and 10 virtual nodes per partition. Show standard deviation of keys per partition as a uniformity metric
Replication on the ring: In consistent hashing, replicate each key to the next N nodes clockwise. Show which nodes hold which replicas. When a node fails, show that replicas on other nodes cover the gap
Jump consistent hashing: Show the alternative algorithm that achieves perfectly uniform distribution without virtual nodes. Compare redistribution cost with ring-based consistent hashing
Node weight: Assign different weights to partitions (e.g., a powerful machine gets 2x the virtual nodes). Show the proportional key distribution

Failure cases to add:

Non-uniform distribution: With few virtual nodes, some partitions get 3x more keys than others. Show the variance
Cascading failure: A node dies, its keys move to the next node on the ring, overloading it. That node dies too. Show the domino effect
Hot key despite good hash function: Even with uniform distribution, a single popular key creates a hot spot on one partition

4.2 Key-Range

Current state: Uniform/skewed distribution, range queries. Config: numPartitions, keyDistribution.

New config options:

autoRebalance (toggle): When a partition exceeds a threshold, split it into two. Show dynamic range adjustment
splitThreshold (slider, 5-15): Number of keys that triggers a partition split
mergeEnabled (toggle): When a partition becomes too empty, merge with adjacent partition

New scenarios:

Dynamic splitting: Start with one partition. As keys are inserted, split partitions that become too full. Show how the ranges adjust automatically (this is how HBase/BigTable works)
Range query efficiency vs hash: Side-by-side comparison: range query [F-P] on key-range partitioning (scan 2 partitions) vs hash partitioning (must scan ALL partitions). Clear winner for range queries
Time-series data: Keys are timestamps. Show how all recent writes go to the last partition (hot partition) while older partitions are cold. Demonstrate the need for pre-splitting by time ranges
Composite keys: Show partitioning on (user_id, timestamp) where the first part determines the partition and the second part determines the sort order within the partition. Enables efficient range queries per user
Partition splits with data migration: Show the actual data movement when a range split occurs: keys in the upper half of the range move to the new partition

Failure cases to add:

Monotonic key insertion (auto-increment IDs): All inserts go to the last partition. Show extreme skew
Range query spanning all partitions: A very wide range query provides no partition pruning benefit
Split cascade: Inserting sorted data causes a chain of splits, all on the rightmost partition

4.3 Hot Partition

Current state: Uniform/skewed/celebrity scenarios, key splitting mitigation. Config: scenario, mitigationEnabled.

New config options:

loadBalancingStrategy (select): "None" vs "Key Splitting" vs "Random Suffix" vs "Request Routing" to show multiple mitigation strategies
monitoringEnabled (toggle): Show real-time metrics (p99 latency per partition, throughput per partition, queue depth) that would trigger alerts
requestPattern (select): Add "Time-based burst" (all requests in a short window) and "Gradual ramp" scenarios

New scenarios:

Random suffix strategy: Append a random suffix (0-9) to hot keys, distributing them across 10 partitions. Show the read-side scatter-gather cost (must read from all 10 and merge)
Request queuing: Show request queues per partition. When a partition is hot, its queue grows while others are idle. Visualize latency increase (queue depth * processing time)
Cascading hot partition: One hot partition overloads a node, causing it to slow down. Other partitions on the same node are affected (noisy neighbor). Show cross-partition interference
Auto-detection and mitigation: Monitor request rate per partition. When the rate exceeds a threshold, automatically apply key splitting. Show the detection delay and the transition period
Comparison table: Show all mitigation strategies side by side with their tradeoffs (read amplification, write amplification, complexity)

Failure cases to add:

Mitigation making reads harder: Key splitting for a counter requires reading all splits and summing. Show the read amplification cost
Flash crowd: Sudden burst of traffic to a new key (viral content). Mitigation strategies need warm-up time
False positive detection: A partition appears hot due to a monitoring spike but it's actually a batch job that will end soon. Unnecessary mitigation adds complexity

4.4 New Subtopic: Consistent Hashing Ring (Interactive)

Rationale: An interactive ring visualization where students can drag nodes around, add/remove nodes, and see keys redistribute in real-time. More hands-on than the current hash-based visualization.

Features:

Draggable partition nodes on the ring
Click to add keys at specific positions
Remove a node and watch keys slide to the next node
Show token ranges per node
Display replication factor coverage

4.5 New Subtopic: Partition Rebalancing Strategies

Rationale: When partitions become unbalanced, how do you rebalance? This is a critical operational concern.

Visualizations:

Fixed number of partitions: Create many more partitions than nodes (e.g., 1000 partitions for 10 nodes). When a node is added, transfer whole partitions. Show minimal data movement
Dynamic partitioning: Partitions split and merge based on size. Show the split/merge triggers and data movement
Proportional partitioning: Assign partitions proportional to node capacity (heterogeneous hardware). Show weighted distribution

5. Transactions

5.1 Dirty Read

Current state: T1 writes, T2 reads uncommitted, T1 aborts. Config: isolationLevel (none vs readCommitted).

New config options:

numTransactions (slider, 2-3): Add a third transaction to show cascading dirty reads (T1 dirty reads T2, T2 dirty reads T3)
operationType (select): "Abort after dirty read" vs "Commit after dirty read" to show that dirty reads are problematic even when the writer commits (because the reader made decisions based on an intermediate state)

New scenarios:

Cascading abort: T1 writes, T2 reads dirty value and writes based on it, T3 reads T2's dirty value. T1 aborts. Now T2 and T3 are both based on invalid data. Show the cascade
Dirty read with decision: T2 reads a dirty balance of $1000, decides to approve a loan. T1 aborts, actual balance is $50. Show the business impact of the dirty read
Read committed implementation: Show how the database maintains the "last committed value" separately from uncommitted changes. Two versions of the row exist simultaneously
Multiple dirty reads from same transaction: T1 makes several writes before aborting. T2 reads each one. Show all the phantom values T2 observed

Failure cases to add:

Dirty read leading to constraint violation: T2 reads dirty value, makes a write that violates a business rule that would have been caught with the committed value
Performance cost of preventing dirty reads: Show the lock/MVCC overhead of read committed vs no isolation

5.2 Non-Repeatable Read

Current state: T1 reads twice, T2 modifies between reads. Config: isolationLevel (readCommitted vs repeatableRead).

New config options:

numRows (slider, 1-3): Show non-repeatable reads affecting multiple rows simultaneously
mvccVisualization (toggle): Show the MVCC version chain that enables repeatable read (snapshot at transaction start)

New scenarios:

MVCC under the hood: Show the version chain: each update creates a new version with a transaction ID. Repeatable read gives T1 a snapshot that always sees versions from before T1 started
Banking scenario: T1 reads all account balances to compute a total. Between reads, T2 transfers money between accounts. T1 sees inconsistent totals even though no money was created or destroyed
Aggregate inconsistency: T1 computes COUNT(*) then SUM(balance). Between the two queries, T2 deletes a row. COUNT and SUM are inconsistent (SUM doesn't match COUNT * average)
Cursor stability: Show the weaker guarantee where only the "current row" under a cursor is locked, allowing changes to other rows

Failure cases to add:

Snapshot too old: Long-running T1 holds a snapshot while T2, T3, T4 make many changes. Show growing version chain and cleanup challenges
Write-read conflict: T1 reads a row, T2 updates it, T1 tries to update it. With snapshot isolation, T1's update may be based on stale data

5.3 Phantom Read

Current state: T1 runs predicate query twice, T2 inserts matching row between. Config: isolationLevel (repeatableRead vs serializable).

New config options:

predicateType (select): "Range (age > 25)" vs "Equality (dept = 'Eng')" vs "Complex (age > 25 AND dept = 'Eng')" to show different predicate scenarios
operationType (select): "Insert" vs "Update" vs "Delete" to show that phantoms can appear from updates moving rows into the predicate range, or disappear from deletes

New scenarios:

Update-based phantom: T1 queries WHERE age > 25. T2 updates Carol's age from 20 to 30. Carol now matches the predicate but didn't before. Second query returns one more row
Delete-based phantom: T1 queries WHERE age > 25, gets 3 rows. T2 deletes Bob (age 32). Second query returns 2 rows. Phantom disappearance
Predicate locking: Show how serializable isolation uses predicate locks (or gap locks) to prevent inserts/updates/deletes in the predicate range. Visualize the locked range on a number line
Index-range locking: Show how databases approximate predicate locks with index-range locks (lock the index entry for "age > 25" which covers a range of the index)
Phantom affecting aggregates: T1 computes COUNT(*) WHERE dept='Eng'. T2 inserts a new engineer. T1's recount shows a different number. Show impact on reporting queries

Failure cases to add:

Gap lock deadlocks: Two transactions trying to insert into the same gap. Show how gap locks can cause deadlocks
Serializable performance cost: Show the throughput difference between repeatable read and serializable due to predicate locking overhead

5.4 Lost Update

Current state: Two transactions read-modify-write a counter. Config: isolationLevel (none vs compareAndSet).

New config options:

concurrencyControl (select): "None" vs "Compare-and-Set" vs "SELECT FOR UPDATE (Pessimistic)" vs "Atomic Increment" to show multiple prevention strategies
numTransactions (slider, 2-4): Show lost update with 3-4 concurrent transactions to amplify the problem
operationType (select): "Counter increment" vs "Balance transfer" vs "Wiki page edit" for different real-world scenarios

New scenarios:

Pessimistic locking (SELECT FOR UPDATE): T1 acquires exclusive lock on the row. T2 blocks waiting for the lock. T1 completes, T2 proceeds with the updated value. No lost update, but reduced concurrency
Atomic increment: Database provides UPDATE counter SET value = value + 1. Single operation, no read-modify-write cycle. Show the simplicity
Multi-row lost update: Transfer $100 from account A to B. Two concurrent transfers can result in money appearing or disappearing if both read the same starting balances
Wiki page conflict: Two users edit the same page. Last save wins and overwrites the first user's changes. Show "optimistic locking" with version numbers
3-transaction pile-up: Three transactions all increment a counter from 10. Expected result: 13. Without protection: 11. Show the compounding loss

Failure cases to add:

CAS retry storm: Under high contention, CAS fails repeatedly. Show exponential backoff and the throughput cliff
Pessimistic lock deadlock: T1 locks row A, T2 locks row B, T1 tries to lock B (blocked), T2 tries to lock A (deadlock). Show deadlock detection and resolution

5.5 Write Skew

Current state: Doctor on-call scenario with snapshot isolation vs serializable. Config: isolationLevel.

New config options:

scenario (select): "Doctor on-call" vs "Meeting room booking" vs "Username uniqueness" to show different write skew patterns
numActors (slider, 2-3): Two or three actors trying to violate the invariant simultaneously
detectionMethod (select): "Serializable" vs "Materializing conflicts" to show the workaround

New scenarios:

Meeting room booking: Two users both check that room X is free at 2pm. Both insert a booking. Double booking results. Invariant: at most one booking per room per timeslot
Username uniqueness: Two users both check that username "coolname" is available. Both create accounts with that username. Duplicate results. Show how a unique index prevents this vs how it fails under snapshot isolation in some databases
Three-doctor write skew: Three doctors on call, invariant is "at least 1 must remain." All three check (see 3 on call), all three leave. Nobody on call
Materializing conflicts: Instead of checking a predicate, insert a "lock row" that represents the constraint. Now the write-write conflict is detected by the database. Show the workaround pattern
Serializable Snapshot Isolation (SSI): Show how SSI detects write skew by tracking read dependencies and detecting dangerous structures (rw-dependency cycles)

Failure cases to add:

Write skew across multiple tables: T1 reads table A, writes table B. T2 reads table B, writes table A. Cross-table dependency makes detection harder
SSI false positive: SSI aborts a transaction that wasn't actually going to cause an anomaly (conservative detection)

5.6 Isolation Comparison

Current state: Static table showing anomalies vs isolation levels.

Improvements:

Make it interactive: click on a cell ("Phantom Read" x "Repeatable Read") to jump to that specific visualization with the right config preset
Add a "Performance Impact" row showing relative throughput at each isolation level
Add "Implementation" row: "Locks", "MVCC", "MVCC + SSI"
Add "Default in" row: "PostgreSQL: Read Committed", "MySQL InnoDB: Repeatable Read"
Add hover tooltips explaining each anomaly in one sentence

5.7 New Subtopic: Deadlock Detection

Rationale: Deadlocks are a critical failure mode in pessimistic concurrency control. Visualizing the wait-for graph makes the concept intuitive.

Config: numTransactions (2-4), detectionStrategy (timeout vs wait-for graph)

Scenarios:

Simple deadlock: T1 holds A, wants B. T2 holds B, wants A. Show the cycle in the wait-for graph
Complex deadlock: T1->T2->T3->T1 cycle
Deadlock resolution: Abort the youngest transaction (least work lost). Show the victim selection criteria
Timeout-based detection: Show how a timeout (less sophisticated) can also resolve deadlocks but may be too aggressive or too slow

5.8 New Subtopic: MVCC (Multi-Version Concurrency Control)

Rationale: MVCC is the foundation of snapshot isolation and repeatable read. A dedicated visualization showing the version chain, garbage collection, and snapshot visibility rules would be extremely valuable.

Scenarios:

Version chain: Each update creates a new version. Show the linked list of versions per row
Snapshot visibility: Transaction with snapshot at time T can only see versions committed before T
Garbage collection: Old versions that no active transaction can see are cleaned up
Long-running transactions preventing GC: A long-running read transaction pins old versions, causing version chain bloat

6. Consensus

6.1 Raft

Current state: Election, replication, node failure, split vote. Config: clusterSize, scenario.

New config options:

heartbeatInterval (slider): Show the relationship between heartbeat frequency and failure detection speed
networkPartition (toggle): Partition the cluster into majority and minority groups
logConflict (toggle): New leader finds a follower with conflicting log entries from a previous term

New scenarios:

Network partition: Leader is partitioned from the majority. Majority elects a new leader and continues. Old leader tries to replicate but gets rejected (stale term). When partition heals, old leader discovers the new term, steps down, and catches up. Show the two parallel timelines
Log conflict resolution: Follower has entries from a previous leader that were never committed. New leader's log diverges. Show how the new leader overwrites the follower's conflicting entries (AppendEntries consistency check with prevLogIndex/prevLogTerm)
Cluster membership change (joint consensus): Add a new node to the cluster. Show the two-phase joint consensus approach: first enter joint configuration (old + new), then transition to new configuration. Prevent split-brain during the transition
Leader transfer: Current leader voluntarily transfers leadership to another node (e.g., before maintenance). Show the TimeoutNow RPC
Read linearizability: Show how a read-only request requires the leader to confirm it's still the leader (via a round of heartbeats) before responding. Without this, a partitioned old leader could serve stale reads

Failure cases to add:

Pre-vote: Node isolated from the cluster keeps incrementing its term. When it rejoins, its high term disrupts the cluster. Show how the pre-vote mechanism (Raft extension) prevents this
Commit of entries from previous terms: A leader can only commit entries from its own term by committing a new entry (the "Figure 8" problem from the Raft paper)
Follower that's very far behind: Show the catch-up mechanism (snapshot installation) when the log gap is too large

6.2 Paxos

Current state: Single proposer (happy path), competing proposers. Config: numProposers, numAcceptors, scenario.

New config options:

numRounds (slider, 1-3): Show multiple Paxos rounds (Multi-Paxos) to demonstrate how a stable leader skips Phase 1
acceptorFailure (toggle): Kill an acceptor during the protocol and show how the majority quorum still reaches consensus
scenarioType additions: "Dueling Proposers (Livelock)" and "Multi-Paxos Optimization"

New scenarios:

Dueling proposers livelock: P1 sends Prepare(1), P2 sends Prepare(2), P1 gets Nack(2), P1 sends Prepare(3), P2 gets Nack(3), P2 sends Prepare(4)... Neither can complete Phase 2 because the other keeps preempting. Show the infinite loop and explain why a stable leader (Multi-Paxos) is needed
Multi-Paxos optimization: A stable leader skips Phase 1 (Prepare/Promise) for subsequent consensus rounds after the first. Show the throughput improvement: 2 message delays instead of 4
Acceptor failure mid-protocol: An acceptor dies after sending Promise but before receiving Accept. Show that the remaining majority can still reach consensus
Value already accepted: In Phase 1b, an acceptor returns a previously accepted value. The proposer MUST propose that value instead of its own. Show this critical safety rule
Paxos Made Simple walkthrough: Step through the "classic" Paxos example from Lamport's paper with concrete values

Failure cases to add:

Minority failure: Up to F failures tolerated with 2F+1 acceptors. Show the exact threshold
Learner inconsistency: How do learners (separate from acceptors) learn the decided value? Show the message pattern
Distinguished proposer failure: The optimization of having a single proposer breaks if that proposer fails. Show recovery

6.3 Total Order Broadcast

Current state: Total order (uniform delivery) vs causal order (happens-before). Config: numNodes, broadcastMode.

New config options:

messageCount (slider, 3-6): More messages to show ordering complexity as message count increases
failureMode (select): "None" vs "Node crash" vs "Network delay" to show reliability guarantees
implementationMode (select): "Sequencer-based" vs "Consensus-based (Paxos)" to show different TOB implementations

New scenarios:

Sequencer-based TOB: A dedicated sequencer node assigns sequence numbers. Show the single point of failure and what happens when the sequencer crashes (need consensus to elect a new sequencer)
Consensus-based TOB: Each message delivery is decided by a Paxos/Raft round. Show how TOB and consensus are equivalent (each can implement the other)
State machine replication: Three replicas of a key-value store. Use TOB to deliver writes in the same order. Show that all replicas end up with identical state regardless of which node received the original write
FIFO + Causal + Total ordering comparison: Show three ordering guarantees side by side with the same set of messages. FIFO only preserves per-sender order. Causal preserves happens-before. Total ensures identical ordering everywhere
Network delay causing reordering: Two messages sent nearly simultaneously. Network delays cause them to arrive in different orders at different nodes. Show how TOB reorders them despite physical delivery order

Failure cases to add:

Node crash during delivery: A node crashes after partially delivering a message. Other nodes must detect this and ensure consistency
Split-brain in sequencer: Network partition isolating the sequencer from some nodes. Show how nodes on the wrong side of the partition stop receiving ordered messages

6.4 Linearizability

Current state: Linearizable history, non-linearizable history, sequential consistency. Config: scenario.

New config options:

numClients (slider, 2-4): More clients create more complex operation overlaps
numOperations (slider, 3-6): More operations per scenario
registerType (select): "Single register" vs "Two registers" to show cross-register linearizability challenges

New scenarios:

Interactive linearization: Let the user drag linearization points along the time axis to find (or fail to find) a valid linearization. Immediate feedback on whether the chosen ordering is consistent
CAS operation: Add compare-and-swap operations to the register. Show how CAS creates stronger ordering constraints than simple reads/writes
Two-register scenario: Operations on registers X and Y. Show how per-register linearizability doesn't imply cross-register linearizability
Real-time ordering vs process ordering: Show a history that satisfies sequential consistency but not linearizability because it violates real-time ordering. Make the distinction crystal clear
Stale read detection: A practical scenario: client writes to the leader, reads from a follower, gets old value. Is this linearizable? Show why not

Failure cases to add:

NP-completeness: Mention that checking linearizability is NP-complete in general, and show a complex example where it's not obvious whether the history is linearizable
Cost of linearizability: Show the latency overhead of linearizable reads (require leader confirmation) vs non-linearizable reads (can read from any replica)

6.5 New Subtopic: Two-Phase Commit (2PC)

Rationale: 2PC is the classic atomic commit protocol for distributed transactions. It's critical for understanding how distributed databases ensure all-or-nothing semantics across partitions.

Config: numParticipants (2-5), failureMode (none, coordinator crash, participant crash, network partition)

Scenarios:

Happy path: Coordinator sends Prepare, all participants vote Yes, coordinator sends Commit
Participant votes No: Coordinator sends Abort to all
Coordinator crash after Prepare, before decision: Participants are stuck (the "blocking" problem of 2PC). Show the uncertainty window
Participant crash after voting Yes: Must recover and still commit (promised to commit)
Three-phase commit (3PC): Show how adding a pre-commit phase eliminates the blocking problem (at the cost of more messages)

6.6 New Subtopic: Lamport Clocks / Vector Clocks

Rationale: Logical clocks are fundamental to understanding distributed ordering. They underpin many of the other visualizations (causal order, conflict detection, CRDT merging).

Config: numNodes (2-4), clockType (Lamport vs Vector), numEvents (5-10)

Scenarios:

Lamport clock: Show the "max(local, received) + 1" rule. Demonstrate that Lamport clocks capture happens-before but can't detect concurrency
Vector clocks: Show the per-node counter vector. Demonstrate concurrent event detection (neither vector dominates the other)
Causal ordering: Show how vector clocks enable exact causality tracking
Clock comparison: Same events with Lamport vs Vector clocks side by side, showing Vector clocks' additional power

6.7 New Subtopic: Byzantine Fault Tolerance (PBFT)

Rationale: Byzantine faults (nodes behaving arbitrarily, including maliciously) are important for blockchain and high-assurance systems. Show how PBFT requires 3f+1 nodes to tolerate f failures.

Config: numNodes (4-7), numFaulty (0-2), scenario (normal, Byzantine leader, Byzantine follower)

Scenarios:

Normal operation: Pre-prepare -> Prepare -> Commit phases
Byzantine leader sends conflicting messages to different nodes
Byzantine follower sends wrong responses. Honest nodes detect inconsistency via quorum

7. Cross-Cutting Improvements

7.1 Shared UX Enhancements

Annotation panel expansion: Add a collapsible "Learn More" panel below each annotation with deeper explanation, relevant paper references, and database-specific implementation notes
Comparison mode: Split the canvas vertically and run two configurations side by side (e.g., B-Tree order=3 vs order=5, or Sync vs Async replication)
Scenario presets: Add a "Presets" dropdown per visualization with named scenarios that set all config values to demonstrate specific concepts (e.g., "CAP theorem demo" for leaderless replication with N=3, W=1, R=1)
Statistics dashboard: Show running counters for each visualization: total messages sent, operations completed, conflicts detected, latency (frames), space used, etc.
Frame-level diff highlighting: When stepping between frames, highlight exactly what changed (new entries, state transitions, value changes) with a brief animation or color flash

7.2 Testing Coverage

Add engine tests for every visualization (currently only useAnimationController has tests)
Property-based tests: For engines with configurable parameters, test that all configs produce valid frame sequences (no undefined states, annotations always present, frame counts > 0)
Snapshot tests for key frames to catch unintended rendering changes

7.3 Accessibility

Color-blind friendly palette option
Keyboard-only navigation through interactive elements
Screen reader annotations for canvas content (off-screen text descriptions synchronized with frames)

FilesExpand file tree

IMPROVEMENTS.md

Latest commit

History

IMPROVEMENTS.md

File metadata and controls

IMPROVEMENTS.md

1. Indexing

1.1 Bitcask

1.2 Hash Map

1.3 LSM Tree

1.4 B-Tree

1.5 New Subtopic: Skip List

1.6 New Subtopic: SSTable / Sorted String Table

2. Encoding

2.1 JSON Detail

2.2 Protobuf Detail

2.3 XML Detail

2.4 Avro Detail

2.5 Comparison

2.6 New Subtopic: MessagePack

2.7 New Subtopic: Thrift (TBinaryProtocol vs TCompactProtocol)

3. Replication

3.1 Single-Leader

3.2 Multi-Leader

3.3 Leaderless

3.4 New Subtopic: Chain Replication

3.5 New Subtopic: Conflict-free Replicated Data Types (CRDTs)

4. Partitioning

4.1 Hash-Based

4.2 Key-Range

4.3 Hot Partition

4.4 New Subtopic: Consistent Hashing Ring (Interactive)

4.5 New Subtopic: Partition Rebalancing Strategies

5. Transactions

5.1 Dirty Read

5.2 Non-Repeatable Read

5.3 Phantom Read

5.4 Lost Update

5.5 Write Skew

5.6 Isolation Comparison

5.7 New Subtopic: Deadlock Detection

5.8 New Subtopic: MVCC (Multi-Version Concurrency Control)

6. Consensus

6.1 Raft

6.2 Paxos

6.3 Total Order Broadcast

6.4 Linearizability

6.5 New Subtopic: Two-Phase Commit (2PC)

6.6 New Subtopic: Lamport Clocks / Vector Clocks

6.7 New Subtopic: Byzantine Fault Tolerance (PBFT)

7. Cross-Cutting Improvements

7.1 Shared UX Enhancements

7.2 Testing Coverage

7.3 Accessibility