bench(cross-system): WIP — graphlite + auksys/gqlite integration atte… by Felipe705x · Pull Request #33 · pleiad/frogql

Felipe705x · 2026-05-05T03:44:49Z

…mpts

Both systems were integrated under the cross-system harness for IC2. Both failed at the data-loading stage on LDBC SF0.1 (the smallest LDBC scale factor). Documented in per-system DIVERGENCES.md with upstream source citations.

This commit is for the documentation paper trail; the integrations are not bench-runnable end-to-end.

graphlite/ (GraphLite-AI/GraphLite, ISO GQL, Sled-backed):

Standalone Cargo bin (graphlite-setup + graphlite-run) using graphlite-rust-sdk 0.0.1.
Setup loads persons (1.5K) cleanly in ~30s, then hangs on the comments phase (151K rows) — never emits another batch line, RSS drops to ~700 KB, no further disk writes.
Root cause (post-mortem from upstream source review): graphlite/src/exec/write_engine/operations/match_insert.rs:506 resolves variable bindings via graph.get_all_nodes() + .filter() in Rust. Per-edge linear filter over the in-memory node HashMap.
Upstream comment in storage/indexes/traits.rs:61 says "ROADMAP v0.4.0 - Batch index operations for bulk data loading": bulk load is unbuilt in v0.0.1.
Other quirks documented in DIVERGENCES.md: UTF-8 lexer panic on non-ASCII chars, '' apostrophe escape rejected (only \' works), 1000-iteration lexer cap caps INSERT batch size at ~40 nodes, USE SCHEMA rejected by parser.

auksys_gqlite/ (auksys/gqlite via gqlite.org, Python via gqlitedb 1.5.1):

Python harness following the pattern of graphqlite/.
Tried four loading approaches (documented in DIVERGENCES.md): UNWIND+per-row-MATCH (O(N²) edges), CREATE INDEX (parser rejects all syntaxes), id_map dance, and finally the canonical single-CREATE statement idiom from their own pokec_*_import.cypher benchmarks.
Final approach loaded 288K nodes successfully in seconds but ground on 315K edges at ~440 KB/sec — would take 30-60+ more minutes.
Their own bench suite tests at ~17K patterns max (PokecTiny); LDBC SF0.1 is ~604K patterns, ~35× their tested scale.
Root architectural blockers documented in DIVERGENCES.md: properties stored as JSON in SQLite TEXT column with no per-property index, no PRAGMA tuning exposed, interpreter creates nodes one-Vec-at-a-time. Open GitLab issues #169 (custom indexes), #196 (streaming), #200 (logical planner) acknowledge missing pieces.

Shared-file additions (over phase0):

run_all.sh: register auksys_gqlite in ALL_SYSTEMS; dispatch python runners under graphqlite|auksys_gqlite.
README.md: table updated to reflect both systems' scaffolded-but- blocked status with links to their DIVERGENCES.md; comparison.txt docs gain an "Errored param rows" section description.
compare_results.py: sentinel result_count = -1 rows are tallied as errors per system, surfaced before the latency tables.

Why this is a draft PR rather than a merge candidate: the graphlite and auksys_gqlite runners can't actually produce per-iter CSV output on LDBC because their setups don't complete. Keeping the work on record for the paper's qualitative comparison; merging would put non-functional system entries in run_all.sh that would always [FAIL] in the orchestrator's skipped.log.

…mpts Both systems were integrated under the cross-system harness for IC2. Both failed at the data-loading stage on LDBC SF0.1 (the smallest LDBC scale factor). Documented in per-system DIVERGENCES.md with upstream source citations. This commit is for the documentation paper trail; the integrations are not bench-runnable end-to-end. graphlite/ (GraphLite-AI/GraphLite, ISO GQL, Sled-backed): - Standalone Cargo bin (graphlite-setup + graphlite-run) using graphlite-rust-sdk 0.0.1. - Setup loads persons (1.5K) cleanly in ~30s, then hangs on the comments phase (151K rows) — never emits another batch line, RSS drops to ~700 KB, no further disk writes. - Root cause (post-mortem from upstream source review): graphlite/src/exec/write_engine/operations/match_insert.rs:506 resolves variable bindings via graph.get_all_nodes() + .filter() in Rust. Per-edge linear filter over the in-memory node HashMap. - Upstream comment in storage/indexes/traits.rs:61 says "ROADMAP v0.4.0 - Batch index operations for bulk data loading": bulk load is unbuilt in v0.0.1. - Other quirks documented in DIVERGENCES.md: UTF-8 lexer panic on non-ASCII chars, `''` apostrophe escape rejected (only `\'` works), 1000-iteration lexer cap caps INSERT batch size at ~40 nodes, `USE SCHEMA` rejected by parser. auksys_gqlite/ (auksys/gqlite via gqlite.org, Python via gqlitedb 1.5.1): - Python harness following the pattern of graphqlite/. - Tried four loading approaches (documented in DIVERGENCES.md): UNWIND+per-row-MATCH (O(N²) edges), CREATE INDEX (parser rejects all syntaxes), id_map dance, and finally the canonical single-CREATE statement idiom from their own pokec_*_import.cypher benchmarks. - Final approach loaded 288K nodes successfully in seconds but ground on 315K edges at ~440 KB/sec — would take 30-60+ more minutes. - Their own bench suite tests at ~17K patterns max (PokecTiny); LDBC SF0.1 is ~604K patterns, ~35× their tested scale. - Root architectural blockers documented in DIVERGENCES.md: properties stored as JSON in SQLite TEXT column with no per-property index, no PRAGMA tuning exposed, interpreter creates nodes one-Vec-at-a-time. Open GitLab issues #169 (custom indexes), #196 (streaming), #200 (logical planner) acknowledge missing pieces. Shared-file additions (over phase0): - run_all.sh: register auksys_gqlite in ALL_SYSTEMS; dispatch python runners under graphqlite|auksys_gqlite. - README.md: table updated to reflect both systems' scaffolded-but- blocked status with links to their DIVERGENCES.md; comparison.txt docs gain an "Errored param rows" section description. - compare_results.py: sentinel `result_count = -1` rows are tallied as errors per system, surfaced before the latency tables. Why this is a draft PR rather than a merge candidate: the graphlite and auksys_gqlite runners can't actually produce per-iter CSV output on LDBC because their setups don't complete. Keeping the work on record for the paper's qualitative comparison; merging would put non-functional system entries in run_all.sh that would always [FAIL] in the orchestrator's skipped.log.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(cross-system): WIP — graphlite + auksys/gqlite integration atte…#33

bench(cross-system): WIP — graphlite + auksys/gqlite integration atte…#33
Felipe705x wants to merge 1 commit into
bench/cross-system-phase0from
bench/cross-system-failed-attempts

Felipe705x commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Felipe705x commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant