Switch to self-contained `jq4j` by andreaTP · Pull Request #25 · Hyperfoil/h5m

andreaTP · 2026-02-24T21:24:00Z

Switches calculateJqValues() from forking /usr/bin/jq via ProcessBuilder to using jq4j, which runs jq as a WASM module inside the JVM.

No external jq binary required
No temp files
Removes "Install jq" step from CI workflows

Existing NodeServiceTest.calculateJqValues* tests pass unchanged.

willr3 · 2026-02-25T02:12:03Z

This is cool! I didn't realize the Value.data could be combined into a single jsonl for jq processing. There is a performance test in perf_test that can create a jq based version of a test in the production Horreum instance and tries to upload 100 json documents to calculate approximately 4200 values. I rebased your branch onto the latest in main and pushed to my fork for testing:
https://github.com/willr3/h5m/tree/switch-to-jq4j

We measure the time it takes to load all 100 json documents and generate the 4200+ values.

┌───────┬─────────┬──────────────┐
│   #   │   main  │switch-to-jq4j│
│threads│ 9dda369 │   ef005c6    │
├───────┼─────────┼──────────────┤
│      1│1m22.993s│     2m39.571s│
├───────┼─────────┼──────────────┤
│     10│0m24.416s│     0m35.119s│
└───────┴─────────┴──────────────┘

I'd like to investigate the cause of the difference to try and reach parity.

stalep · 2026-02-25T09:49:41Z

Performance Analysis: jq4j (PR #25) vs main

Test Setup

Rebased PR Switch to self-contained jq4j #25 (48bb47d) on top of current main (3e5ab50) to include DAG performance improvements
Workload: 100 uploads, 23 jq nodes, producing 4526 values
Profiled with async-profiler (CPU event sampling)

Benchmark Results

Branch	Database	upload_real	upload_user	upload_sys
main (`3e5ab50`)	SQLite	0m43.4s	0m46.7s	0m8.3s
jq4j-rebased (b713095)	SQLite	1m44.9s	1m52.4s	0m5.7s
main (`3e5ab50`)	PostgreSQL	0m48.9s	0m33.1s	0m7.1s
jq4j-rebased (b713095)	PostgreSQL	1m46.3s	1m45.7s	0m2.2s

jq4j is ~2.2-2.4x slower than ProcessBuilder jq on both databases. Both produce identical results (4526 values, 100 uploads).

Note: upload_sys is lower with jq4j (no process forking), but upload_user is significantly higher (WASM interpreter CPU time).

Async-Profiler CPU Breakdown (11,940 samples, SQLite)

Component	Samples	%	Description
`NodeService.calculateJqValues`	6,102	51.1%	Total inclusive time in jq calculation
`Instance$Builder.build` (Chicory)	6,049	50.7%	WASM module instantiation — called on every `Jq.builder().run()`
`func_392` (WASM interpreter)	4,651	39.0%	Self time — actual jq logic executing in WASM interpreter
`func_648` (WASM interpreter)	344	2.9%	Self time
`func_325` (WASM interpreter)	283	2.4%	Self time
`Work.dependsOn`	481	4.0%	Self time (DAG dependency checking)
SQLite native	476	4.0%	Self time

Root Cause

The dominant cost is that every Jq.builder().run() call creates a new Chicory WASM Instance via Instance$Builder.build() → Instance.initialize(). With ~2,300 jq invocations (23 jq nodes × 100 uploads), this means 2,300 WASM module instantiations — each allocating fresh WASM memory, importing host functions, and initializing function tables.

The call path is:

NodeService.calculateJqValues
  → Jq$Builder.run
    → Instance$Builder.build        ← 50.7% of all CPU
      → Instance.initialize
        → JqModuleMachineFuncGroup_0.func_392  ← 39.0% self time (jq WASM execution)

Potential Optimizations

Cache/pool the WASM Instance — if jq4j supports reusing instances across calls with different stdin/args, this would eliminate the ~50.7% instantiation overhead. This is the biggest win.
Cache the WasmModule parse — JqModule.load() appears in the profile, suggesting the WASM binary may be re-parsed on each call.
The inherent WASM interpreter overhead (~39% self time) is expected with Chicory's Java-based WASM runtime and cannot be eliminated without a different execution strategy (e.g., AOT compilation of the WASM module).

andreaTP · 2026-02-25T15:57:26Z

Thanks for all the performance considerations!

I implemented a "reactor-mode" and an "instance pool" to re-use as much as possible the runtime instances.
A quick correction on "WASM interpreter overhead" we are already using jq compiled at build time with Chicory, there is no interpreter overhead, is the jq execution itself that is heavy.

Update coming soon!

andreaTP · 2026-02-25T16:33:19Z

Everything is updated now, ready to look at the next perf results.

andreaTP added 3 commits February 25, 2026 16:31

switch to jq4j

9cd128a

use reactor

c36e20b

switch to release

41c8a00

andreaTP force-pushed the switch-to-jq4j branch from f3756c6 to 41c8a00 Compare February 25, 2026 16:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to self-contained `jq4j`#25

Switch to self-contained `jq4j`#25
andreaTP wants to merge 3 commits intoHyperfoil:mainfrom
andreaTP:switch-to-jq4j

andreaTP commented Feb 24, 2026

Uh oh!

willr3 commented Feb 25, 2026

Uh oh!

stalep commented Feb 25, 2026 •

edited

Loading

Uh oh!

andreaTP commented Feb 25, 2026 •

edited

Loading

Uh oh!

andreaTP commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andreaTP commented Feb 24, 2026

Uh oh!

willr3 commented Feb 25, 2026

Uh oh!

stalep commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Analysis: jq4j (PR #25) vs main

Test Setup

Benchmark Results

Async-Profiler CPU Breakdown (11,940 samples, SQLite)

Root Cause

Potential Optimizations

Uh oh!

andreaTP commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreaTP commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stalep commented Feb 25, 2026 •

edited

Loading

andreaTP commented Feb 25, 2026 •

edited

Loading