Bench/cross system webbery by Felipe705x · Pull Request #35 · pleiad/frogql

Felipe705x · 2026-05-05T15:05:54Z

No description provided.

The cross-system bench was hardcoded to IC2 throughout: shell vars, file paths, output labels, comparison-table headers. Adding any other IC (when its toml flips to status="implemented") would have meant editing every per-system runner plus the orchestrator. This refactor pushes the IC number through as an explicit --ic <n> parameter (default 2) so adding new ICs is a matter of dropping in `bench/cross-system/<system>/ic<n>.{cypher,gql,...}` per system and bumping the toml status. No runner code changes needed. Plumbing: - run_all.sh: --ic flag, passes through to every per-system runner, records ic in run_info.txt. - gqlite/run.sh: --ic plumbed to ldbc_bench's --ic. - graphqlite/setup.py: SUPPORTED_ICS check; per-IC ic<n>.db path. - graphqlite/run.py: derives toml/cypher/db paths from --ic; reads params_file and expected_shape from the toml. - compare_results.py: extracts query label from CSV, includes [IC<n>] in section headers. - README: --ic <n> usage documented; CSV schema fixed (was referencing a result_shape column that doesn't exist).

…stem Kuzu (gitlab.com/kuzudb/kuzu, MIT, CIDR 2023 paper from the Waterloo DSG, vectorized columnar engine) integrated as a fourth system in the cross-system bench, joining gqlite + graphqlite. End-to-end IC2 run with all three at LDBC SF0.1, 5 iters + 2 warmup, all 15 params: gqlite (lazy) median 0.20-0.41 ms kuzu-cypher median 5.77-7.24 ms (~25× slower than gqlite) graphqlite median 28.45-32.70 ms (~5× slower than kuzu) All three systems agree on row count (20) and result shape (i,s,s,i,n/s,i) for every param row. ## On the archival caveat Kùzu Inc. archived the GitHub repo on 2025-10-10 with v0.11.3 as the final release. We pin to that version in `requirements.txt` — the PyPI wheel is frozen and reproducible indefinitely. For a research- paper-tier comparison "active maintenance" isn't a strict criterion; "engineered, working, reproducible" is, and Kuzu satisfies all three. The CIDR 2023 paper is cite-able. Full framing in kuzu/DIVERGENCES.md. ## Integration choices - Schema-first DDL (CREATE NODE/REL TABLE) with PRIMARY KEY on each node table — gives auto-indexed point lookups for the IC2-start `MATCH (p:Person {id: $personId})` without any extra DDL or the index-DDL acrobatics that auksys/gqlite forced us into. - Native bulk loader: `COPY <Table> FROM '<csv>' (DELIM='|', HEADER=true)`. Loaded all 288K nodes + 315K edges in 0.9s. - Multi-typed REL TABLE for `hasCreator(FROM Comment TO Person, FROM Post TO Person)`. COPY FROM into multi-typed REL TABLEs requires (FROM='X', TO='Y') hints to disambiguate the sub-table. - IC2 query keeps the canonical "one MATCH + relationship-type- implicit label disjunction" shape: `MATCH (...)<-[:hasCreator]-(c)` with `c` unlabeled. Kuzu's multi-typed REL TABLE constrains `c` to Comment-or-Post automatically. Initial UNION ALL approach was rejected because LIMIT-after-UNION ALL only applies to the second branch in Kuzu's parser; details in kuzu/DIVERGENCES.md. ## Reproducibility hardening (the related cleanup) - Added `bench/cross-system/install_python_deps.sh`: one-liner that runs `pip install -r requirements.txt` in every per-system subdir. Replaces the previous "discover each subdir's requirements.txt yourself" experience. - Added `bench/cross-system/kuzu/README.md` per-system README with prereqs and standalone-run instructions. The other systems' per-system READMEs follow as the existing systems' loaders gain doc updates. - Top-level `bench/cross-system/README.md`: - new "Per-system prerequisites" subsection in the Setup section listing each system's install command in one table - Kuzu row added to the systems table - new explicit "Per-system data load" subsection documenting the one-time setup-py invocations per system The bench now installs deps and loads in one command each: bash bench/cross-system/install_python_deps.sh bash bench/cross-system/run_all.sh --ic 2

After thorough review of the upstream README, the bison grammar (`src/gql.y`), the lexer (`src/gql.l`), the C API (`include/gqlite.h`), and the example tests, we rejected webbery/gqlite as unfit for the cross-system IC2 bench. This is not a vibes-based rejection. The original bench plan flagged this project as "dead since April 2023" and time-boxed any attempt at three days. After the deep dive, the rejection is grounded in concrete blockers: 1. **DSL cannot express `LIMIT`.** `src/gql.y:115` declares a `limit` token; `src/gql.l:147` lexes it. But NO grammar production rule references it. The complete query grammar (gql.y:362-379) is `{query_kind [, graph_expr [, where_expr]]}` — no LIMIT, no ORDER BY, no projection beyond `query_kind`. The keyword is half-implemented and the bench's `LIMIT 20` semantics can't be expressed. 2. **No label-disjunction syntax.** WHERE clauses use MongoDB-style operators (`$lt`, `$gt`, `$and`, `$or`) on property values; there's no syntax for "entity belongs to label A or label B". IC2's `(c:Comment|Post)` would require two separate queries combined externally, breaking the cross-system "same logical shape" rule. 3. **Per-row load idiom only.** `test/movielens.cpp` (their reference loader for ~10K movies + ~5K tags) does one `gqlite_exec` per CSV row with a `char[512]` buffer. At LDBC SF0.1 scale (~600K entities including MVAs), this is the same architectural shape that made auksys/gqlite take indefinite hours. 4. **C-only API + no bench-scale tests.** No Python/Ruby/etc. bindings; would require a custom C++ harness. The README states the project's purpose is "for testing abilities in ending device" (IoT/edge form factors), not LDBC-scale workloads. 5. **Last commit 2023-04-08** — over 2 years dead. README labels the DSL as "Unstable"; CHANGELOG.md is a single line. The full diagnosis with grammar-line citations is in `bench/cross-system/webbery_gqlite/SKIPPED.md`. The top-level README's "What gets compared" table is updated to mark this row as discarded with a link to the breakdown. This is documentation-only — no setup.* or run.* is provided. The orchestrator's `run_all.sh` already detects missing per-system runners and logs `[SKIP]` to `skipped.log`. ## On the survey-of-the-landscape this gives us Out of five external systems evaluated this branch family (graphqlite, GraphLite-AI, auksys/gqlite, Kuzu, webbery/gqlite), two are bench-runnable end-to-end: - graphqlite (boring SQLite extension, works) - Kuzu (vectorized columnar, archived but pinned) The remaining three (GraphLite-AI, auksys/gqlite, webbery/gqlite) all fail at the data-load stage with structurally similar issues: custom DSL or aspirational ISO-GQL/Cypher, no real bulk-load API, missing core SQL/Cypher features at the parser level, designed for small-data niche workloads. This is itself a finding worth mentioning in the paper's "Systems considered but rejected" or "Threats to validity" subsection — the candidate landscape for "small embedded graph DB" is sparse, and most projects in it don't survive contact with LDBC SF0.1 (the smallest LDBC scale factor).

Verified by re-reading: - src/gql.y (the bison grammar) — every production rule - src/gql.l (the flex lexer) — including all start-state blocks - include/gqlite.h (the public C API) - the .gql test files under test/vertex/ and test/query/ (the canonical query-shape examples) - example/main.c (the README's reference example, recopied) Two findings worth tightening in the writeup: 1. The `limit` token IS used in test/vertex/grammar.gql:35 inside a `$near` operator clause: `{feature_name: {limit: 3, $near: ...}}`. I'd missed this. But tracing it through the grammar (condition_property → right_value → condition_object → OP_NEAR ':' '{' geometry_condition '}' → OP_GEOMETRY ':' a_vector ',' range_comparable) shows no production accepts the `limit` token even there — it's either an aspirational test or relies on a path I couldn't find. SKIPPED.md now documents this nuance and notes that even if it did parse, vector-search top-k semantics are not the result-set LIMIT IC2 requires. 2. The `query_kind_expr` rule (gql.y:436) is precisely `KW_EDGE | LITERAL_STRING | a_graph_properties`. The single-string label restriction is now cited verbatim. No production accepts an array of labels. The verdict doesn't change — webbery/gqlite cannot express IC2 faithfully, the project is dead 2+ years, and the other blockers (no bulk load, C-only API, IoT-targeted scale) all stand. But the documentation is now precise enough to survive a careful upstream reader pointing at the test/vertex/grammar.gql edge case.

Per request to actually verify rather than just analyze: tried to build webbery/gqlite on Windows + MSVC 2022 + bundled CMake (3.31.6-msvc6), May 2026. Hit FOUR sequential build failures, each requiring manual intervention: 1. **git-describe submodule version**: shallow clone breaks libmdbx's CMake tag-matcher. Worked around by --unshallow + synthetic tag. 2. **flex/bison not on PATH for MSBuild**: README claims the bundled `tool/{flex,bison}.exe` "needs no dependency" but MSBuild doesn't honor `WORKING_DIRECTORY` for cwd-based binary lookup. Worked around by pre-generating parser/lexer manually AND prefixing `tool/` to PATH for the build invocation. 3. **libmdbx version-mismatch** in CMake-templated `version.c`: `MDBX_VERSION_MAJOR/MINOR` mismatch between mdbx.h (0,11) and the generated file (0,0) fires a preprocessor `#error`. Worked around by patching the generated version.c. 4. **MASM assembly file not built/linked**: project contains x86_64 coroutine assembly stubs but CMake doesn't `enable_language(ASM_MASM)`, so MSBuild compiles the .cpp consumers but never assembles the .asm files into .objs. Linker fails with `LNK1181: cannot open input file 'jump_x86_64_ms_pe_masm.obj'`. We stopped at #4. The point isn't "Windows is hard" — every other system in this bench (kuzu, graphqlite, gqlite) builds clean from a fresh `git clone` on the same Windows + MSVC 2022 setup. webbery's build issues are specific to its pre-2024 CMake config and unmaintained submodules. **Combined with the grammar-level blockers (no LIMIT, no label disjunction), the build issues are the second- order signal — even if all four were patched, the DSL still can't express IC2 faithfully.** The verdict is unchanged: discard. But the SKIPPED.md is now empirically grounded rather than just grammar-reading. A reviewer who clones the repo will hit the same wall.

Felipe705x added 3 commits May 4, 2026 23:37

Felipe705x changed the base branch from main to bench/cross-system-kuzu May 5, 2026 15:06

Felipe705x added 2 commits May 5, 2026 11:14

Felipe705x force-pushed the bench/cross-system-kuzu branch from 700f47f to c25b453 Compare May 5, 2026 21:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bench/cross system webbery#35

Bench/cross system webbery#35
Felipe705x wants to merge 5 commits into
bench/cross-system-kuzufrom
bench/cross-system-webbery

Felipe705x commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Felipe705x commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant