Add buffer-native serialization hooks and mmap-backed partition path; benchmark MessagePack + mmap#314
Add buffer-native serialization hooks and mmap-backed partition path; benchmark MessagePack + mmap#314Copilot wants to merge 7 commits into
Conversation
Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/2d14c40a-455b-4979-8d38-26c19a7dae0e Co-authored-by: albe <4259532+albe@users.noreply.github.com>
…k path Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/2d14c40a-455b-4979-8d38-26c19a7dae0e Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/2d14c40a-455b-4979-8d38-26c19a7dae0e Co-authored-by: albe <4259532+albe@users.noreply.github.com>
| } | ||
|
|
||
| this.writeBuffer = success && Buffer.allocUnsafeSlow(this.writeBufferSize); | ||
| this.writeBuffer = success && this.createWriteBuffer(); |
There was a problem hiding this comment.
Instead of replacing the write buffer with an mmaped tmp file, the actual partition file is supposed to be mapped and directly written to!
There was a problem hiding this comment.
Implemented in b5ad377. mmapWriteBuffer now maps the actual partition file append region (page-aligned mapping on the partition FD) instead of using a temporary mapped file, and flush syncs/trims that mapped file region.
…e on mmap path Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/4b87c269-1fb4-4f6e-993d-f554b5b71830 Co-authored-by: albe <4259532+albe@users.noreply.github.com>
albe
left a comment
There was a problem hiding this comment.
@copilot see comments.
The concept for mmap usage should be this:
- there is no manual write/read buffer anymore, the mmap is the kernel managed buffer
- the whole file is mapped (kernel page cache will take care to not make this a memory issue), and it will grow by the writeBufferSize (which needs to be a multiple of the page size) when a write would go over the bounds. Only remap when going over current mapped bounds
- in the readonly case, the mmap also grows that amount when the file changed and grew beyond current mapped size. If it was shrinked (truncated), only remap the file if the truncation is more than two write buffer sizes smaller than before
- avoid truncating the file on remap, the file only needs to be truncated on flush and when closing from the writer
- the repair logic needs to take care of files that weren't truncated correctly, so it needs to make sure it finds the proper last document position (should already work but needs to be confirmed)
- reads and writes on the storage level pass the serializeTo(From)Buffer to the partition, which now takes responsibility of that rather than receiving the serialized document from the storage layer; the partition write then returns the actually written size/new file position so the storage doesn't need to know the serialized document size beforehand
- the b/c solution for old string serializer config (i.e. JSON) is to wrap that at the storage level and pass that as a serializeTo(From)Buffer to the partition layer
- the goal is to reduce code complexity through mmap by removing manual buffer management and also gain noticable performance benefits when using a serializer that can deal with Buffers directly, as that will then directly interact with the mmap
Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/ffc10813-8b7a-4b43-9a94-96b3224e557a Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/ffc10813-8b7a-4b43-9a94-96b3224e557a Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Agent-Logs-Url: https://github.com/albe/node-event-storage/sessions/ffc10813-8b7a-4b43-9a94-96b3224e557a Co-authored-by: albe <4259532+albe@users.noreply.github.com>
Implemented in ff12b45 and be7db61: mmap now writes directly to the partition-file mapping, remaps only when append writes exceed current mapped bounds (growing by |
|
Closing in favor of #315 |
This change investigates serializer support for direct buffer-oriented I/O (starting with MessagePack) and wires that capability into partition writes/reads using mmap-backed buffering. It also extends benchmark scenarios to compare
latestvslatest+msgpack+mmapon the same workloads.Serializer/Storage API extensions
serializedSize(document)serializeToBuffer(document, buffer, offset)deserializeBuffer(buffer, offset, size)serialize/deserializebehavior when hooks are not provided.serializedSize()/serializeToBuffer()in the mmap path, preventing double serialization.Partition I/O path updates
readBufferFrom(position, size?, headerOut?).writeSerialized(dataSize, serializeToBuffer, sequenceNumber?, callback?).Bufferpayloads.mmapWriteBuffermode.writeBufferSizechunks when append writes exceed mapped bounds.mmap integration details
@riaskov/mmap-iowith fallback).Benchmark coverage
bench/msgpackSerializer.js).latest+msgpack+mmapbenchmark variants to:bench-storage.jsbench-eventstore.jsbench-read-scenarios.jsTargeted correctness coverage