Skip to content

siluk00/gocask

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gocask

CI

A log-structured key-value storage engine in Go, modeled after the Bitcask paper from Basho.

Built as the storage layer of a larger distributed systems portfolio — exposed via HTTP and gRPC servers, and used as the state machine backend for a Raft consensus implementation.

put  →  append to log  →  update in-memory index  →  O(1)
get  →  lookup index   →  single disk seek         →  O(1)
del  →  write tombstone →  remove from index        →  O(1)

Why this exists

Most KV store implementations skip the hard parts: crash recovery, compaction, CRC-validated binary encoding, hint files, and the tradeoffs that make each decision non-obvious. This one doesn’t.

Every design decision is documented in ARCHITECTURE.md with the reasoning and the alternatives explicitly rejected. The engine is built to be read, not just used.


Design

On-disk format

Every record written to the log has a fixed binary header:

┌──────────┬───────────┬──────────────┬──────────────┬─────────────────────┐
│  CRC32   │ Timestamp │   Key Size   │  Val Size†   │   Key   │   Value   │
│  4 bytes │  8 bytes  │   4 bytes    │   4 bytes    │   ...   │    ...    │
└──────────┴───────────┴──────────────┴──────────────┴─────────────────────┘

† Bit 31 of Val Size is reserved as the tombstone flag.

CRC32 is stored first so corruption is detected before allocating memory for the key or value. Timestamp enables compaction to resolve conflicts between duplicate keys across segments.

The keydir

The keydir is an in-memory hash map that stores the exact disk location of every live key:

type KeydirEntry struct {
    SegmentPath string  // which segment file
    Offset      int64   // byte offset of the record header
    ValueSize   uint32  // how many bytes to read
    Timestamp   int64   // used during compaction
}

A Get never scans the log. It does exactly one map lookup and one disk seek. This is the core performance property of the Bitcask model.

Tradeoff: the keydir must fit entirely in RAM. This is a deliberate design boundary — the Bitcask model trades bounded key-space for O(1) read latency. For datasets where the key space exceeds available RAM, an LSM-tree or B-tree structure is more appropriate. See ARCHITECTURE.md.

Deletion via tombstones

Delete does not remove data from disk. It appends a tombstone record — a header-only entry with bit 31 of ValSize set — and removes the key from the keydir. Disk space is reclaimed during compaction.

This design guarantees that a delete survives a crash: if the process dies after writing the tombstone, the key remains absent after recovery. Using bit 31 as the tombstone flag (rather than a zero-length value convention) preserves the ability to store []byte{} as a valid value.

Crash recovery

On Open, the engine scans all segment files in chronological order and reconstructs the keydir. For each segment it first attempts the fast path: loading from a hint file (which omits value bytes — reads ~20% of the data). If the hint file is missing or its CRC is invalid, it falls back to a full record-by-record scan with CRC validation. Corrupted or partial records cause the segment to be truncated to the last valid offset.

Segment rotation

When the active segment reaches MaxFileSize, it is sealed (reopened as O_RDONLY) and a new active segment is created. Segment names encode a unix timestamp, pid, and atomic counter — lexicographic sort equals chronological order, which recovery relies on.


Usage

import "github.com/siluk00/gocask/internal"

store, err := gocask.Open(gocask.Config{
    Dir:         "/tmp/mystore",
    MaxFileSize: 1 << 30,          // 1GB segments
    Policy:      0,                // no SyncOnWrite
})
if err != nil {
    log.Fatal(err)
}
defer store.Close()

// Write
if err := store.Put([]byte("user:1001"), []byte(`{"name":"felipe"}`)); err != nil {
    log.Fatal(err)
}

// Read
val, err := store.Get([]byte("user:1001"))
if err != nil {
    log.Fatal(err) // gocask.ErrKeyNotFound if absent
}

// Delete (writes tombstone, reclaimed at compaction)
if err := store.Delete([]byte("user:1001")); err != nil {
    log.Fatal(err)
}

The store is safe for concurrent use. Multiple goroutines may call Get simultaneously. Put and Delete are serialized internally.


Configuration

type Config struct {
    Dir         string      // store directory (required)
    MaxFileSize uint32      // rotate active segment at this size in bytes
    Policy      ConfigFlags // bitmask of behavioral flags
}

const (
    SyncOnWrite ConfigFlags = 1 << iota // fsync after every Put (~97x write latency cost)
)

SyncOnWrite off is the correct default for most workloads. It means the OS may buffer up to ~30s of writes before flushing to disk. A process crash loses at most that window. A machine crash (power loss) loses more without SyncOnWrite.


Project structure

gocask/
├── internal/
│   ├── gocask.go              # Store implementation + Bitcask interface
│   ├── storage/
│   │   ├── segment.go         # Append-only log segment
│   │   ├── keydir.go          # In-memory index
│   │   ├── record.go          # Binary encoding: records, hints, metadata
│   │   └── recovery.go        # Crash recovery + hint file loading
│   ├── bench_test.go          # Throughput benchmarks
│   ├── highload_test.go       # Concurrent correctness under rotation
│   ├── profiling_test.go      # CPU, memory, block profiling entry points
│   └── store_test.go          # Unit + integration tests
├── cmd/
│   ├── server-http/           # HTTP server [TODO]
│   └── server-grpc/           # gRPC server [TODO]
├── proto/                     # Protobuf definitions [TODO]
├── profiles/                  # Generated profile files (gitignored)
├── ARCHITECTURE.md
├── Makefile
└── go.mod

Running

# Tests with race detector
make test

# All benchmarks
make bench

# CPU profile — sequential Put (opens browser)
make profile-cpu-put

# Memory profile — Put allocations (opens browser)
make profile-mem-put

# Block profile — lock contention under parallel writes (opens browser)
make profile-block

# Recovery benchmark
make profile-recovery

# Lint
make lint

Requirements: Go 1.22+, golangci-lint, graphviz (for profiling)


Known limitations

These are intentional design boundaries, not missing features:

Keydir must fit in RAM. Each entry costs ~80 bytes. 1M keys ≈ 80MB.

No range queries. Hash index — exact-key lookup only. Range scans require a full log scan and are not a supported use case.

Single active writer. Concurrent writes are serialized via mutex. Concurrent reads scale without contention via sync.RWMutex.

Values are opaque bytes. Serialization and schema management are the caller’s responsibility.

No transactions. Per-key atomicity only.


Reading


License

MIT

About

gocask - a modern Bitcaask implementation in Go with HTTP and gRPC interfaces

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors