Skip to content

Commit e1c4dff

Browse files
committed
update bench and optimize
1 parent 69a62d2 commit e1c4dff

4 files changed

Lines changed: 201 additions & 4 deletions

File tree

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
{
22
"title": "How-to Guides",
3-
"pages": ["migrate-from-json-extract", "real-world-queries", "editor-setup"]
3+
"pages": ["migrate-from-json-extract", "real-world-queries", "optimizing-queries", "editor-setup"]
44
}
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
title: Optimizing Queries
3+
description: How to write jsonata_query expressions that stream in constant memory — and what happens when they don't
4+
---
5+
6+
# Optimizing Queries
7+
8+
`jsonata_query` includes a query planner that decomposes expressions into streaming accumulators at compile time. When it can decompose, queries run in **constant memory** with a single table scan — matching native SQL performance. When it can't, it falls back to accumulating all rows in memory.
9+
10+
The difference is significant: **83ms vs 439ms** on 100K rows for the same 5-aggregate report. On larger datasets, the gap widens further since streaming is O(1) memory while accumulation is O(n).
11+
12+
## What streams
13+
14+
These patterns are recognized at compile time and never buffer rows:
15+
16+
| Pattern | Example | Memory |
17+
|---|---|---|
18+
| Simple aggregates | `$sum(amount)`, `$count($)`, `$max(price)` | O(1) |
19+
| Filtered aggregates | `$sum($filter($, function($v){$v.status = "completed"}).amount)` | O(1) |
20+
| Count distinct | `$count($distinct(region))` | O(unique) |
21+
| Object/array constructors | `{ "a": $sum(x), "b": $max(y) }` | O(1) |
22+
| Post-aggregate arithmetic | `$sum(x) - $count($)` | O(1) |
23+
| Finalizer functions | `$round($average(x), 2)` | O(1) |
24+
| Constants | `"Q1 Report"`, `42` | O(1) |
25+
| Constant folding | `$sum(amount * 1.1)``$sum(amount) * 1.1` | O(1) |
26+
27+
## What falls back to O(n)
28+
29+
These patterns require all rows in memory. They work correctly, but memory and time scale linearly with row count:
30+
31+
| Pattern | Why it can't stream |
32+
|---|---|
33+
| `$sort($, function($a,$b){...})` | Needs all rows to determine order |
34+
| `$reduce($, function($a,$v){...}, init)` | Each step depends on all previous rows |
35+
| `$map($, function($v){...})` | Output is one element per row — O(n) by definition |
36+
| Variable bindings + nested lambdas | `($x := $sum(amount); $map($, function($v){$v.amount / $x}))` — two-pass dependency |
37+
38+
### What O(n) costs in practice
39+
40+
On 100K rows with a 5-aggregate report:
41+
42+
| Mode | Time | Memory |
43+
|---|---|---|
44+
| Streaming | 83ms | O(1) |
45+
| Accumulating | 439ms | O(100K rows) |
46+
47+
At 1M rows, accumulation means holding every row in memory before evaluation begins. Streaming processes each row once and discards it.
48+
49+
## Mixed expressions: partial fallback
50+
51+
When streaming and opaque patterns coexist, **only the opaque keys pay the O(n) cost**:
52+
53+
```sql
54+
jsonata_query('{
55+
"total": $sum(amount), -- streams: O(1)
56+
"avg": $average(amount), -- streams: O(1)
57+
"top_5": $sort($, fn)[0..4] -- accumulates: O(n)
58+
}', data)
59+
```
60+
61+
`total` and `avg` run in constant memory regardless. The planner doesn't give up on the entire expression because one key is expensive.
62+
63+
## Keeping expressions on the fast path
64+
65+
### Use identical predicate text for shared filters
66+
67+
Predicates are deduplicated by **string equality**. Identical text shares one evaluation per row; rephrased predicates evaluate separately:
68+
69+
```sql
70+
-- Shared: one predicate evaluation per row
71+
$sum($filter($, function($v){$v.status = "completed"}).amount)
72+
$average($filter($, function($v){$v.status = "completed"}).amount)
73+
74+
-- NOT shared: different parameter name → two evaluations per row
75+
$sum($filter($, function($v){$v.status = "completed"}).amount)
76+
$average($filter($, function($row){$row.status = "completed"}).amount)
77+
```
78+
79+
### Push sorting and filtering into SQL
80+
81+
If you need the top N results, filter in SQL before the expression touches rows:
82+
83+
```sql
84+
-- Instead of jsonata_query('$sort($, fn)[0..4]', data) over 100K rows:
85+
SELECT jsonata('...', data) FROM orders
86+
ORDER BY json_extract(data, '$.amount') DESC LIMIT 5;
87+
```
88+
89+
### Use json_each for simple array expansion
90+
91+
`jsonata_each` evaluates a full JSONata expression per row. For simple array expansion, `json_each` is ~6x faster:
92+
93+
```sql
94+
-- Simple expand: prefer json_each
95+
SELECT j.value FROM events, json_each(data, '$.items') j;
96+
97+
-- Filter + transform: jsonata_each earns its cost
98+
SELECT * FROM events, jsonata_each('items[price > 100].{
99+
"name": product, "total": price * qty
100+
}', data);
101+
```
102+
103+
### Use json_set for simple mutations
104+
105+
`jsonata_set` re-parses the entire document. For simple path updates, `json_set` is 5-7x faster:
106+
107+
```sql
108+
-- Simple: prefer json_set
109+
SELECT json_set(data, '$.status', 'done') FROM events;
110+
111+
-- Nested creation: jsonata_set earns its cost (creates intermediate objects)
112+
SELECT jsonata_set(data, 'meta.source.type', '"import"') FROM events;
113+
```
114+
115+
### Watch for format functions
116+
117+
`$base64`, `$urlencode`, `$htmlescape`, and other format functions bypass the GJSON fast path, requiring full JSONata evaluation (~8-18 us/row vs ~0.25 us/row for simple paths). In mixed expressions, only the key using the format function pays this cost.
118+
119+
## Quick reference
120+
121+
| Expression | Streams? | Notes |
122+
|---|---|---|
123+
| `$sum(amount)` | yes | Simple path accumulator |
124+
| `$sum(amount * 1.1)` | yes | Constant folded |
125+
| `$sum($filter($, fn).amount)` | yes | Predicate + conditional accumulator |
126+
| `$count($distinct(region))` | yes | O(unique) memory |
127+
| `{ "a": $sum(x), "b": $max(y) }` | yes | Parallel accumulators, batch extraction |
128+
| `$round($average(x), 2)` | yes | Finalizer on streaming average |
129+
| `$sum(x) - $count($)` | yes | Post-aggregate arithmetic |
130+
| `$sort(...)` | **no** | O(n) — needs all data |
131+
| `$reduce($, fn, init)` | **no** | O(n) — cross-row state |
132+
| `$map($, fn)` | **no** | O(n) — output is one element per row |
133+
134+
See the [query planner](/docs/explanation/query-planner) for the full decomposition model and internal optimization details.

website/src/components/benchmark-table.tsx

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import {
88
AccordionContent,
99
} from '@/components/ui/accordion';
1010
import { createHighlighter, type Highlighter } from 'shiki';
11+
import { Sparkline } from './sparkline';
1112
import { tokyoNightDark } from '@/lib/tokyo-night-dark';
1213
import { tokyoNightLight } from '@/lib/tokyo-night-light';
1314

@@ -277,7 +278,7 @@ function SuiteSection({
277278
const paired = suite.tests.filter((t) => t.ratio !== null && isFinite(t.ratio!));
278279
const avgRatio =
279280
paired.length > 0
280-
? paired.reduce((sum, t) => sum + t.ratio!, 0) / paired.length
281+
? Math.exp(paired.reduce((sum, t) => sum + Math.log(t.ratio!), 0) / paired.length)
281282
: null;
282283

283284
return (
@@ -298,10 +299,16 @@ function SuiteSection({
298299
</span>
299300
{avgRatio !== null && (
300301
<span
301-
className="font-mono text-[13px] font-medium"
302+
className="flex items-center gap-2 font-mono text-[13px] font-medium"
302303
style={{ color: ratioColor(avgRatio, isDark) }}
303304
>
304-
avg {avgRatio.toFixed(2)}x
305+
<Sparkline
306+
values={paired.map((t) => t.ratio!)}
307+
width={paired.length * 3 + (paired.length - 1)}
308+
height={12}
309+
color={(v) => ratioColor(v, isDark)}
310+
/>
311+
{avgRatio.toFixed(2)}x
305312
</span>
306313
)}
307314
</button>
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
'use client';
2+
3+
interface SparklineProps {
4+
values: number[];
5+
width?: number;
6+
height?: number;
7+
color?: string | ((value: number, index: number) => string);
8+
className?: string;
9+
}
10+
11+
export function Sparkline({
12+
values,
13+
width = 36,
14+
height = 16,
15+
color = '#7aa2f7',
16+
className,
17+
}: SparklineProps) {
18+
if (values.length === 0) return null;
19+
20+
const max = Math.max(...values);
21+
if (max === 0) return null;
22+
23+
const gap = 1;
24+
const barWidth = Math.max(1, (width - gap * (values.length - 1)) / values.length);
25+
const minBarHeight = 2;
26+
27+
const getColor = typeof color === 'function' ? color : () => color;
28+
29+
return (
30+
<svg
31+
width={width}
32+
height={height}
33+
viewBox={`0 0 ${width} ${height}`}
34+
className={className}
35+
aria-hidden
36+
>
37+
{values.map((v, i) => {
38+
const barHeight = Math.max(minBarHeight, (v / max) * height);
39+
const x = i * (barWidth + gap);
40+
const y = height - barHeight;
41+
42+
return (
43+
<rect
44+
key={i}
45+
x={x}
46+
y={y}
47+
width={barWidth}
48+
height={barHeight}
49+
rx={barWidth > 2 ? 0.5 : 0}
50+
fill={getColor(v, i)}
51+
/>
52+
);
53+
})}
54+
</svg>
55+
);
56+
}

0 commit comments

Comments
 (0)