bench(cross-system): WIP — graphlite + auksys/gqlite integration atte…#33
Draft
Felipe705x wants to merge 1 commit into
Draft
bench(cross-system): WIP — graphlite + auksys/gqlite integration atte…#33Felipe705x wants to merge 1 commit into
Felipe705x wants to merge 1 commit into
Conversation
…mpts
Both systems were integrated under the cross-system harness for IC2.
Both failed at the data-loading stage on LDBC SF0.1 (the smallest
LDBC scale factor). Documented in per-system DIVERGENCES.md with
upstream source citations.
This commit is for the documentation paper trail; the integrations
are not bench-runnable end-to-end.
graphlite/ (GraphLite-AI/GraphLite, ISO GQL, Sled-backed):
- Standalone Cargo bin (graphlite-setup + graphlite-run) using
graphlite-rust-sdk 0.0.1.
- Setup loads persons (1.5K) cleanly in ~30s, then hangs on the
comments phase (151K rows) — never emits another batch line, RSS
drops to ~700 KB, no further disk writes.
- Root cause (post-mortem from upstream source review):
graphlite/src/exec/write_engine/operations/match_insert.rs:506
resolves variable bindings via graph.get_all_nodes() + .filter()
in Rust. Per-edge linear filter over the in-memory node HashMap.
- Upstream comment in storage/indexes/traits.rs:61 says
"ROADMAP v0.4.0 - Batch index operations for bulk data loading":
bulk load is unbuilt in v0.0.1.
- Other quirks documented in DIVERGENCES.md: UTF-8 lexer panic on
non-ASCII chars, `''` apostrophe escape rejected (only `\'`
works), 1000-iteration lexer cap caps INSERT batch size at ~40
nodes, `USE SCHEMA` rejected by parser.
auksys_gqlite/ (auksys/gqlite via gqlite.org, Python via gqlitedb 1.5.1):
- Python harness following the pattern of graphqlite/.
- Tried four loading approaches (documented in DIVERGENCES.md):
UNWIND+per-row-MATCH (O(N²) edges), CREATE INDEX (parser rejects
all syntaxes), id_map dance, and finally the canonical single-CREATE
statement idiom from their own pokec_*_import.cypher benchmarks.
- Final approach loaded 288K nodes successfully in seconds but ground
on 315K edges at ~440 KB/sec — would take 30-60+ more minutes.
- Their own bench suite tests at ~17K patterns max (PokecTiny);
LDBC SF0.1 is ~604K patterns, ~35× their tested scale.
- Root architectural blockers documented in DIVERGENCES.md: properties
stored as JSON in SQLite TEXT column with no per-property index, no
PRAGMA tuning exposed, interpreter creates nodes one-Vec-at-a-time.
Open GitLab issues #169 (custom indexes), #196 (streaming), #200
(logical planner) acknowledge missing pieces.
Shared-file additions (over phase0):
- run_all.sh: register auksys_gqlite in ALL_SYSTEMS; dispatch python
runners under graphqlite|auksys_gqlite.
- README.md: table updated to reflect both systems' scaffolded-but-
blocked status with links to their DIVERGENCES.md; comparison.txt
docs gain an "Errored param rows" section description.
- compare_results.py: sentinel `result_count = -1` rows are tallied
as errors per system, surfaced before the latency tables.
Why this is a draft PR rather than a merge candidate: the graphlite
and auksys_gqlite runners can't actually produce per-iter CSV output
on LDBC because their setups don't complete. Keeping the work on
record for the paper's qualitative comparison; merging would put
non-functional system entries in run_all.sh that would always [FAIL]
in the orchestrator's skipped.log.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…mpts
Both systems were integrated under the cross-system harness for IC2. Both failed at the data-loading stage on LDBC SF0.1 (the smallest LDBC scale factor). Documented in per-system DIVERGENCES.md with upstream source citations.
This commit is for the documentation paper trail; the integrations are not bench-runnable end-to-end.
graphlite/ (GraphLite-AI/GraphLite, ISO GQL, Sled-backed):
''apostrophe escape rejected (only\'works), 1000-iteration lexer cap caps INSERT batch size at ~40 nodes,USE SCHEMArejected by parser.auksys_gqlite/ (auksys/gqlite via gqlite.org, Python via gqlitedb 1.5.1):
Shared-file additions (over phase0):
result_count = -1rows are tallied as errors per system, surfaced before the latency tables.Why this is a draft PR rather than a merge candidate: the graphlite and auksys_gqlite runners can't actually produce per-iter CSV output on LDBC because their setups don't complete. Keeping the work on record for the paper's qualitative comparison; merging would put non-functional system entries in run_all.sh that would always [FAIL] in the orchestrator's skipped.log.