Skip to content

Columnar memtable never flushes — unbounded memory growth on INSERT #38

@hollanf

Description

@hollanf

Every INSERT into a columnar collection accumulates in the in-memory memtable forever; nothing ever drains it to disk. A steady write workload OOMs the process.

Summary

ColumnarMemtable exposes should_flush() (returns true at DEFAULT_FLUSH_THRESHOLD = 65_536 rows) and MutationEngine exposes on_memtable_flushed(segment_id) to finalize a flush — but no caller on the write path ever invokes either. The columnar write handler inserts rows in a loop and returns; rows stay resident in ColumnarMemtable::columns indefinitely.

Current code

nodedb/src/data/executor/handlers/columnar_write.rs:75-99 — the insert loop:

for row in &ndb_rows {
    let obj = match row {
        nodedb_types::Value::Object(m) => m,
        _ => continue,
    };
    let values: Vec<Value> = schema
        .columns
        .iter()
        .map(|col| ndb_field_to_value(obj.get(&col.name), &col.column_type))
        .collect();

    match engine.insert(&values) {
        Ok(_) => accepted += 1,
        Err(e) => {
            return self.response_error(task, ErrorCode::Internal {
                detail: format!("columnar insert failed: {e}"),
            });
        }
    }
}
// ↑ No should_flush() / drain_optimized() / on_memtable_flushed() call anywhere.

Memtable + engine APIs that exist but are unused from the write path:

  • nodedb-columnar/src/memtable/mod.rs:85pub fn should_flush(&self) -> bool
  • nodedb-columnar/src/memtable/mod.rs:152pub fn drain_optimized(&mut self)
  • nodedb-columnar/src/mutation.rs:173pub fn on_memtable_flushed(&mut self, new_segment_id: u32)
  • nodedb-columnar/src/mutation.rs:279pub fn should_flush(&self) -> bool

Repo-wide grep confirms the only callers of on_memtable_flushed are the columnar crate's own tests (mutation.rs:465, 519); should_flush is called from nodedb-fts and nodedb/src/engine/timeseries/memtable.rs but never for the columnar engine in nodedb/src/**.

Why it's broken

  • Every inserted row stays in ColumnarMemtable::columns (Vec<ColumnData>) forever.
  • PkIndex::upsert also clones the PK bytes (Vec<u8>) into the in-memory index — a second permanent allocation per row.
  • No segment is ever written to disk → WAL is also never checkpoint-truncated because MemtableFlushed records are never emitted → the WAL itself grows without bound alongside the memtable.
  • No backpressure: insert() always returns Ok. A client ingesting at 50 k rows/s at ~100 B/row adds ~5 MB/s to RSS with no ceiling until OOM.

Reproduction

CREATE COLLECTION c (id INT PRIMARY KEY, v TEXT) USING COLUMNAR;
-- Loop INSERT ~100 k rows in a script.
-- Observe: RSS grows monotonically, no *.seg files appear under $DATA_DIR.
-- Process OOMs long before any row hits disk.

Notes

  • Found during a CPU/memory audit sweep of the columnar engine and its wire-up in nodedb/src/data/executor/handlers/.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions