Skip to content

feat(core,py): bulk-insert primitives — 2.4.0 / 0.3.0#5

Merged
oldnordic merged 3 commits into
mainfrom
feat/bulk-insert
May 15, 2026
Merged

feat(core,py): bulk-insert primitives — 2.4.0 / 0.3.0#5
oldnordic merged 3 commits into
mainfrom
feat/bulk-insert

Conversation

@oldnordic
Copy link
Copy Markdown
Owner

Summary

  • Adds SqliteGraph::insert_entities_bulk / insert_edges_bulk — single-transaction, prepare_cached, rollback on error. Empty input is a no-op. Returns rowids in input order.
  • Adds GraphBackend::insert_nodes_bulk / insert_edges_bulk trait methods with default implementations that loop the single-insert path, so every existing GraphBackend consumer keeps working at 2.3 → 2.4 with no source changes. The &B blanket forwarders are wired through.
  • SqliteGraphBackend overrides both, dispatching to the new SqliteGraph bulk paths. Publisher events fire per row after commit (preserves single-insert observer semantics).
  • Python: Graph.add_nodes_bulk(items: list[dict]) and Graph.add_edges_bulk(items: list[dict]) — same field set as kwargs-style add_node/add_edge, one FFI call per batch.
  • Versions bumped: sqlitegraph-core 2.3.0 → 2.4.0, sqlitegraph-py 0.2.0 → 0.3.0 (SemVer minor — additive, no breakage).

Closes the build-time gap downstream consumers see when loading large graphs from grounded-index DBs. Benchmarks will be re-run in the grounded-graph repo after the wheels land on PyPI.

Implementation notes

  • All-or-nothing semantics on the bulk paths: validation first (rejects empty names / unknown endpoints / etc.), then BEGIN, then loop, then COMMIT. On any error: ROLLBACK and return the original error. Default trait impl inherits whatever atomicity single-insert provides.
  • prepare_cached is reused across rows, so the SQL statement is parsed once per batch (not once per row).
  • V3Backend inherits the default loop impl — a follow-up patch can route V3's bulk path through the existing WriteBatchGuard for native batched writes.

Test plan

  • cargo fmt --all --check — clean
  • RUSTFLAGS="-D warnings" cargo check --lib --bins — clean
  • cargo test -p sqlitegraph --lib -- --test-threads=1 — 1161 passed, 0 failed
  • cargo test -p sqlitegraph --test bulk_insert_tests — 8 passed (input-order IDs, empty input, rollback on validation error, edge bulk parity, observable-state parity)
  • maturin develop --release && pytest tests/ — 61 passed (10 new bulk-insert tests)

🤖 Generated with Claude Code

oldnordic and others added 3 commits May 16, 2026 01:40
Adds insert_*_bulk methods that batch multiple inserts inside a single
transaction with a reused prepare_cached statement. Closes the 8x build-
time gap downstream consumers see when loading large graphs from
grounded-index DBs (Python->Rust FFI per add_node was the bottleneck).

Core (sqlitegraph-core):
- SqliteGraph::insert_entities_bulk and insert_edges_bulk: BEGIN -
  prepare_cached(INSERT) - loop execute + last_insert_rowid - COMMIT.
  Empty input returns Ok(vec![]) without opening a transaction. On any
  error mid-batch: ROLLBACK and return the error; the database is left
  untouched. Returns rowids in input order.
- GraphBackend::insert_nodes_bulk and insert_edges_bulk: trait methods
  with default implementations that loop the single-insert path, so any
  existing GraphBackend consumer keeps working at 2.3 -> 2.4 with no
  source changes. The &B blanket forwarders are wired through.
- SqliteGraphBackend overrides both, dispatching to the new
  SqliteGraph bulk paths. Publisher events fire per row after commit to
  preserve single-insert observer semantics; no new batched event type.

Python (sqlitegraph-py):
- Graph.add_nodes_bulk(items: list[dict]) and add_edges_bulk(items): each
  dict carries the same fields as the kwargs-style add_node/add_edge.
  Missing required fields raise; valid items go through in one FFI call.

Tests:
- 8 Rust integration cases in tests/bulk_insert_tests.rs: input-order
  IDs, empty input, validation rollback, edge bulk parity, observable
  state matches a per-item loop.
- 10 Python cases in tests/test_bulk_insert.py: both bulk paths,
  missing-field validation, data/file_path round-trip, parity with the
  per-item loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- sqlitegraph-core: 2.3.0 -> 2.4.0 (new GraphBackend::insert_*_bulk
  trait methods with default impls; SqliteGraph::insert_*_bulk
  transactional bulk paths; SqliteGraphBackend overrides). SemVer minor.
- sqlitegraph-py:   0.2.0 -> 0.3.0 (Graph.add_nodes_bulk and
  add_edges_bulk Python methods). SemVer minor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-heals the python CI step on PR #5:
- Replace bare PyException::new_err with InvalidArgumentError::new_err
  for the missing-field validators on add_nodes_bulk/add_edges_bulk so
  callers see a sqlitegraph-typed exception instead of a generic one.
- Update test_bulk_insert.py to assert InvalidArgumentError specifically
  (silences ruff B017) and pass strict=True to zip (silences ruff B905).
- Apply ruff format to the new test file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@oldnordic oldnordic merged commit 2f9b9d1 into main May 15, 2026
10 checks passed
@oldnordic oldnordic deleted the feat/bulk-insert branch May 15, 2026 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant