feat(core,py): bulk-insert primitives — 2.4.0 / 0.3.0#5
Merged
Conversation
Adds insert_*_bulk methods that batch multiple inserts inside a single transaction with a reused prepare_cached statement. Closes the 8x build- time gap downstream consumers see when loading large graphs from grounded-index DBs (Python->Rust FFI per add_node was the bottleneck). Core (sqlitegraph-core): - SqliteGraph::insert_entities_bulk and insert_edges_bulk: BEGIN - prepare_cached(INSERT) - loop execute + last_insert_rowid - COMMIT. Empty input returns Ok(vec![]) without opening a transaction. On any error mid-batch: ROLLBACK and return the error; the database is left untouched. Returns rowids in input order. - GraphBackend::insert_nodes_bulk and insert_edges_bulk: trait methods with default implementations that loop the single-insert path, so any existing GraphBackend consumer keeps working at 2.3 -> 2.4 with no source changes. The &B blanket forwarders are wired through. - SqliteGraphBackend overrides both, dispatching to the new SqliteGraph bulk paths. Publisher events fire per row after commit to preserve single-insert observer semantics; no new batched event type. Python (sqlitegraph-py): - Graph.add_nodes_bulk(items: list[dict]) and add_edges_bulk(items): each dict carries the same fields as the kwargs-style add_node/add_edge. Missing required fields raise; valid items go through in one FFI call. Tests: - 8 Rust integration cases in tests/bulk_insert_tests.rs: input-order IDs, empty input, validation rollback, edge bulk parity, observable state matches a per-item loop. - 10 Python cases in tests/test_bulk_insert.py: both bulk paths, missing-field validation, data/file_path round-trip, parity with the per-item loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- sqlitegraph-core: 2.3.0 -> 2.4.0 (new GraphBackend::insert_*_bulk trait methods with default impls; SqliteGraph::insert_*_bulk transactional bulk paths; SqliteGraphBackend overrides). SemVer minor. - sqlitegraph-py: 0.2.0 -> 0.3.0 (Graph.add_nodes_bulk and add_edges_bulk Python methods). SemVer minor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-heals the python CI step on PR #5: - Replace bare PyException::new_err with InvalidArgumentError::new_err for the missing-field validators on add_nodes_bulk/add_edges_bulk so callers see a sqlitegraph-typed exception instead of a generic one. - Update test_bulk_insert.py to assert InvalidArgumentError specifically (silences ruff B017) and pass strict=True to zip (silences ruff B905). - Apply ruff format to the new test file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SqliteGraph::insert_entities_bulk/insert_edges_bulk— single-transaction, prepare_cached, rollback on error. Empty input is a no-op. Returns rowids in input order.GraphBackend::insert_nodes_bulk/insert_edges_bulktrait methods with default implementations that loop the single-insert path, so every existingGraphBackendconsumer keeps working at 2.3 → 2.4 with no source changes. The&Bblanket forwarders are wired through.SqliteGraphBackendoverrides both, dispatching to the newSqliteGraphbulk paths. Publisher events fire per row after commit (preserves single-insert observer semantics).Graph.add_nodes_bulk(items: list[dict])andGraph.add_edges_bulk(items: list[dict])— same field set as kwargs-styleadd_node/add_edge, one FFI call per batch.sqlitegraph-core2.3.0 → 2.4.0,sqlitegraph-py0.2.0 → 0.3.0 (SemVer minor — additive, no breakage).Closes the build-time gap downstream consumers see when loading large graphs from grounded-index DBs. Benchmarks will be re-run in the grounded-graph repo after the wheels land on PyPI.
Implementation notes
BEGIN, then loop, thenCOMMIT. On any error:ROLLBACKand return the original error. Default trait impl inherits whatever atomicity single-insert provides.prepare_cachedis reused across rows, so the SQL statement is parsed once per batch (not once per row).WriteBatchGuardfor native batched writes.Test plan
cargo fmt --all --check— cleanRUSTFLAGS="-D warnings" cargo check --lib --bins— cleancargo test -p sqlitegraph --lib -- --test-threads=1— 1161 passed, 0 failedcargo test -p sqlitegraph --test bulk_insert_tests— 8 passed (input-order IDs, empty input, rollback on validation error, edge bulk parity, observable-state parity)maturin develop --release && pytest tests/— 61 passed (10 new bulk-insert tests)🤖 Generated with Claude Code