Skip to content

Add buffer-native serialization hooks and mmap-backed partition path; benchmark MessagePack + mmap#314

Closed
Copilot wants to merge 7 commits into
mainfrom
copilot/check-magpack-serialization
Closed

Add buffer-native serialization hooks and mmap-backed partition path; benchmark MessagePack + mmap#314
Copilot wants to merge 7 commits into
mainfrom
copilot/check-magpack-serialization

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 14, 2026

This change investigates serializer support for direct buffer-oriented I/O (starting with MessagePack) and wires that capability into partition writes/reads using mmap-backed buffering. It also extends benchmark scenarios to compare latest vs latest+msgpack+mmap on the same workloads.

  • Serializer/Storage API extensions

    • Added optional serializer hooks for buffer-native paths:
      • serializedSize(document)
      • serializeToBuffer(document, buffer, offset)
      • deserializeBuffer(buffer, offset, size)
    • Preserved compatibility with existing serialize / deserialize behavior when hooks are not provided.
    • Added storage fallback behavior for partial serializer implementations.
    • Updated mmap write behavior to avoid calling serializedSize() / serializeToBuffer() in the mmap path, preventing double serialization.
  • Partition I/O path updates

    • Added raw payload read API: readBufferFrom(position, size?, headerOut?).
    • Added direct serialized write API: writeSerialized(dataSize, serializeToBuffer, sequenceNumber?, callback?).
    • Updated write paths to handle both string and Buffer payloads.
    • Added optional mmapWriteBuffer mode.
    • Refined mmap mode so it directly maps the partition file and writes into that mapping (no separate manual write buffer in mmap mode), with remap growth in writeBufferSize chunks when append writes exceed mapped bounds.
  • mmap integration details

    • Mmap support is loaded optionally (@riaskov/mmap-io with fallback).
    • Mmap mode maps the partition file directly and relies on kernel-managed buffering.
    • File size growth/remap is performed only when needed for capacity; truncation back to logical size occurs on flush/close.
  • Benchmark coverage

    • Added MessagePack serializer module for bench usage (bench/msgpackSerializer.js).
    • Added latest+msgpack+mmap benchmark variants to:
      • bench-storage.js
      • bench-eventstore.js
      • bench-read-scenarios.js
  • Targeted correctness coverage

    • Added tests for:
      • raw buffer reads (including large-document branch),
      • serializer fallback path when hooks are incomplete,
      • serialized size mismatch assertions in both buffered and unbuffered write paths.
const storage = new Storage('s', {
  mmapWriteBuffer: true,
  serializer: {
    serializedSize: (doc) => pack(doc).length,
    serializeToBuffer: (doc, buffer, offset) => pack(doc).copy(buffer, offset),
    deserializeBuffer: (buffer, offset, size) => unpack(buffer.subarray(offset, offset + size)),
    serialize: JSON.stringify,   // compatibility fallback
    deserialize: JSON.parse
  }
});

Copy link
Copy Markdown
Owner

@albe albe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot address the comments

Comment thread src/Partition/WritablePartition.js Outdated
}

this.writeBuffer = success && Buffer.allocUnsafeSlow(this.writeBufferSize);
this.writeBuffer = success && this.createWriteBuffer();
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of replacing the write buffer with an mmaped tmp file, the actual partition file is supposed to be mapped and directly written to!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in b5ad377. mmapWriteBuffer now maps the actual partition file append region (page-aligned mapping on the partition FD) instead of using a temporary mapped file, and flush syncs/trims that mapped file region.

Comment thread src/Storage/WritableStorage.js
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 14, 2026

@copilot address the comments

Done in b5ad377. I addressed the requested review comments: mmap now maps/writes the actual partition file region, and the mmap write path no longer calls serializedSize() in WritableStorage.write().

Copilot finished work on behalf of albe May 14, 2026 06:33
Copilot AI requested a review from albe May 14, 2026 06:33
Copy link
Copy Markdown
Owner

@albe albe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot see comments.

The concept for mmap usage should be this:

  • there is no manual write/read buffer anymore, the mmap is the kernel managed buffer
  • the whole file is mapped (kernel page cache will take care to not make this a memory issue), and it will grow by the writeBufferSize (which needs to be a multiple of the page size) when a write would go over the bounds. Only remap when going over current mapped bounds
  • in the readonly case, the mmap also grows that amount when the file changed and grew beyond current mapped size. If it was shrinked (truncated), only remap the file if the truncation is more than two write buffer sizes smaller than before
  • avoid truncating the file on remap, the file only needs to be truncated on flush and when closing from the writer
  • the repair logic needs to take care of files that weren't truncated correctly, so it needs to make sure it finds the proper last document position (should already work but needs to be confirmed)
  • reads and writes on the storage level pass the serializeTo(From)Buffer to the partition, which now takes responsibility of that rather than receiving the serialized document from the storage layer; the partition write then returns the actually written size/new file position so the storage doesn't need to know the serialized document size beforehand
  • the b/c solution for old string serializer config (i.e. JSON) is to wrap that at the storage level and pass that as a serializeTo(From)Buffer to the partition layer
  • the goal is to reduce code complexity through mmap by removing manual buffer management and also gain noticable performance benefits when using a serializer that can deal with Buffers directly, as that will then directly interact with the mmap

Comment thread src/Partition/WritablePartition.js
Comment thread src/Partition/WritablePartition.js
Comment thread test/Storage.spec.js Outdated
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 14, 2026

@copilot see comments.

The concept for mmap usage should be this: ...

Implemented in ff12b45 and be7db61: mmap now writes directly to the partition-file mapping, remaps only when append writes exceed current mapped bounds (growing by writeBufferSize chunks), and truncation to logical size is handled on flush/close. I also removed the pending mmap tests as requested.

Copilot finished work on behalf of albe May 14, 2026 09:28
Copilot AI requested a review from albe May 14, 2026 09:28
@albe
Copy link
Copy Markdown
Owner

albe commented May 15, 2026

Closing in favor of #315

@albe albe closed this May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants