Skip to content

Switch to self-contained jq4j#25

Open
andreaTP wants to merge 3 commits intoHyperfoil:mainfrom
andreaTP:switch-to-jq4j
Open

Switch to self-contained jq4j#25
andreaTP wants to merge 3 commits intoHyperfoil:mainfrom
andreaTP:switch-to-jq4j

Conversation

@andreaTP
Copy link

Switches calculateJqValues() from forking /usr/bin/jq via ProcessBuilder to using jq4j, which runs jq as a WASM module inside the JVM.

  • No external jq binary required
  • No temp files
  • Removes "Install jq" step from CI workflows

Existing NodeServiceTest.calculateJqValues* tests pass unchanged.

@willr3
Copy link
Collaborator

willr3 commented Feb 25, 2026

This is cool! I didn't realize the Value.data could be combined into a single jsonl for jq processing. There is a performance test in perf_test that can create a jq based version of a test in the production Horreum instance and tries to upload 100 json documents to calculate approximately 4200 values. I rebased your branch onto the latest in main and pushed to my fork for testing:
https://github.com/willr3/h5m/tree/switch-to-jq4j

We measure the time it takes to load all 100 json documents and generate the 4200+ values.

┌───────┬─────────┬──────────────┐
│   #   │   main  │switch-to-jq4j│
│threads│ 9dda369 │   ef005c6    │
├───────┼─────────┼──────────────┤
│      1│1m22.993s│     2m39.571s│
├───────┼─────────┼──────────────┤
│     10│0m24.416s│     0m35.119s│
└───────┴─────────┴──────────────┘

I'd like to investigate the cause of the difference to try and reach parity.

@stalep
Copy link
Member

stalep commented Feb 25, 2026

Performance Analysis: jq4j (PR #25) vs main

Test Setup

  • Rebased PR Switch to self-contained jq4j #25 (48bb47d) on top of current main (3e5ab50) to include DAG performance improvements
  • Workload: 100 uploads, 23 jq nodes, producing 4526 values
  • Profiled with async-profiler (CPU event sampling)

Benchmark Results

Branch Database upload_real upload_user upload_sys
main (3e5ab50) SQLite 0m43.4s 0m46.7s 0m8.3s
jq4j-rebased (b713095) SQLite 1m44.9s 1m52.4s 0m5.7s
main (3e5ab50) PostgreSQL 0m48.9s 0m33.1s 0m7.1s
jq4j-rebased (b713095) PostgreSQL 1m46.3s 1m45.7s 0m2.2s

jq4j is ~2.2-2.4x slower than ProcessBuilder jq on both databases. Both produce identical results (4526 values, 100 uploads).

Note: upload_sys is lower with jq4j (no process forking), but upload_user is significantly higher (WASM interpreter CPU time).

Async-Profiler CPU Breakdown (11,940 samples, SQLite)

Component Samples % Description
NodeService.calculateJqValues 6,102 51.1% Total inclusive time in jq calculation
Instance$Builder.build (Chicory) 6,049 50.7% WASM module instantiation — called on every Jq.builder().run()
func_392 (WASM interpreter) 4,651 39.0% Self time — actual jq logic executing in WASM interpreter
func_648 (WASM interpreter) 344 2.9% Self time
func_325 (WASM interpreter) 283 2.4% Self time
Work.dependsOn 481 4.0% Self time (DAG dependency checking)
SQLite native 476 4.0% Self time

Root Cause

The dominant cost is that every Jq.builder().run() call creates a new Chicory WASM Instance via Instance$Builder.build()Instance.initialize(). With ~2,300 jq invocations (23 jq nodes × 100 uploads), this means 2,300 WASM module instantiations — each allocating fresh WASM memory, importing host functions, and initializing function tables.

The call path is:

NodeService.calculateJqValues
  → Jq$Builder.run
    → Instance$Builder.build        ← 50.7% of all CPU
      → Instance.initialize
        → JqModuleMachineFuncGroup_0.func_392  ← 39.0% self time (jq WASM execution)

Potential Optimizations

  1. Cache/pool the WASM Instance — if jq4j supports reusing instances across calls with different stdin/args, this would eliminate the ~50.7% instantiation overhead. This is the biggest win.
  2. Cache the WasmModule parseJqModule.load() appears in the profile, suggesting the WASM binary may be re-parsed on each call.
  3. The inherent WASM interpreter overhead (~39% self time) is expected with Chicory's Java-based WASM runtime and cannot be eliminated without a different execution strategy (e.g., AOT compilation of the WASM module).

@andreaTP
Copy link
Author

andreaTP commented Feb 25, 2026

Thanks for all the performance considerations!

I implemented a "reactor-mode" and an "instance pool" to re-use as much as possible the runtime instances.
A quick correction on "WASM interpreter overhead" we are already using jq compiled at build time with Chicory, there is no interpreter overhead, is the jq execution itself that is heavy.

Update coming soon!

@andreaTP
Copy link
Author

Everything is updated now, ready to look at the next perf results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants