Skip to content

Commit e79da48

Browse files
committed
Merge remote-tracking branch 'origin/main' into fix/parser-cast-expression
# Conflicts: # tree-sitter-ggsql/test/corpus/basic.txt
2 parents 228aea0 + fcf8c82 commit e79da48

220 files changed

Lines changed: 35586 additions & 6666 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CLAUDE.md

Lines changed: 106 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
SELECT date, revenue, region FROM sales WHERE year = 2024
1111
VISUALISE date AS x, revenue AS y, region AS color
1212
DRAW line
13-
SCALE x SETTING type => 'date'
13+
SCALE x VIA date
1414
COORD cartesian SETTING ylim => [0, 100000]
1515
LABEL title => 'Sales by Region', x => 'Date', y => 'Revenue'
1616
THEME minimal
@@ -22,7 +22,7 @@ THEME minimal
2222
- 507-line Tree-sitter grammar (simplified, no external scanner)
2323
- Full bindings: Rust, C, Python, Node.js with tree-sitter integration
2424
- Syntax highlighting support via Tree-sitter queries
25-
- 166 total tests (comprehensive parser, builder, and integration tests)
25+
- 916 total tests (174 parser tests, comprehensive builder and integration tests)
2626
- End-to-end working pipeline: SQL → Data → Visualization
2727
- Coordinate transformations: Cartesian (xlim/ylim), Flip, Polar
2828
- VISUALISE FROM shorthand syntax with automatic SELECT injection
@@ -100,8 +100,10 @@ DRAW line MAPPING month AS x, total AS y
100100
101101
102102
┌───────────────────────────────┐
103-
│ Query Splitter │
104-
│ (Regex-based, tree-sitter) │
103+
│ SourceTree │
104+
│ (Parse once, reuse CST) │
105+
│ • extract_sql() │
106+
│ • extract_visualise() │
105107
└───────────┬───────────────────┘
106108
107109
┌───────────┴───────────┐
@@ -226,24 +228,36 @@ For detailed API documentation, see [`src/doc/API.md`](src/doc/API.md).
226228

227229
**Responsibility**: Split queries and parse visualization specifications into typed AST.
228230

229-
#### Query Splitter (`splitter.rs`)
231+
#### SourceTree (`source_tree.rs`)
230232

231-
- Uses tree-sitter to parse the full query and find VISUALISE statements
232-
- Splits query at byte offset of first VISUALISE statement
233-
- Handles VISUALISE FROM by injecting `SELECT * FROM <source>`
234-
- Robust to parse errors in SQL portion (complex SQL we don't fully parse)
235-
- Properly handles semicolons between SQL statements
233+
**Parse-once architecture** that eliminates duplicate parsing throughout the pipeline.
236234

237-
**Key Features:**
235+
**Core Design**:
238236

239-
1. **Byte offset splitting**: Uses character positions instead of parse tree node boundaries
240-
2. **SELECT injection**: Automatically adds `SELECT * FROM <source>` when VISUALISE FROM is used
237+
- Wraps tree-sitter `Tree` + source text + `Language`
238+
- Parses query once, reuses CST for all operations
239+
- Declarative tree-sitter query API instead of manual tree walking
240+
- Lazy extraction methods for SQL and VISUALISE portions
241+
242+
**High-Level Query API**:
243+
244+
- `find_node(query)` - Find first matching node via tree-sitter query
245+
- `find_nodes(query)` - Find all matching nodes
246+
- `find_text(query)` - Extract text of first match
247+
- `find_texts(query)` - Extract text of all matches
248+
249+
**Lazy Extraction Methods**:
250+
251+
- `extract_sql()` - Lazily extract SQL portion (before VISUALISE)
252+
- `extract_visualise()` - Lazily extract VISUALISE portion
253+
- Both methods use declarative tree-sitter queries
254+
- Handles VISUALISE FROM by automatically injecting `SELECT * FROM <source>`
241255

242256
#### Tree-sitter Integration (`mod.rs`)
243257

244258
- Uses `tree-sitter-ggsql` grammar (507 lines, simplified approach)
245259
- Parses **full query** (SQL + VISUALISE) into concrete syntax tree (CST)
246-
- Grammar supports: PLOT/TABLE/MAP types, DRAW/SCALE/FACET/COORD/LABEL/GUIDE/THEME clauses
260+
- Grammar supports: PLOT/TABLE/MAP types, DRAW/SCALE/FACET/COORD/LABEL/THEME clauses
247261
- British and American spellings: `VISUALISE` / `VISUALIZE`
248262
- **SQL portion parsing**: Basic SQL structure (SELECT, WITH, CREATE, INSERT, subqueries)
249263
- **Recursive subquery support**: Fully recursive grammar for complex SQL
@@ -266,11 +280,14 @@ Key grammar rules:
266280

267281
```rust
268282
pub fn parse_query(query: &str) -> Result<Vec<Plot>> {
269-
// Parse full query (SQL + VISUALISE) with tree-sitter
270-
let tree = parse_full_query(query)?;
283+
// Parse once with SourceTree
284+
let source_tree = SourceTree::new(query)?;
285+
286+
// Validate query structure
287+
source_tree.validate()?;
271288

272289
// Build AST from parse tree
273-
let specs = builder::build_ast(&tree, query)?;
290+
let specs = builder::build_ast(&source_tree)?;
274291
Ok(specs)
275292
}
276293
```
@@ -288,7 +305,6 @@ pub struct Plot {
288305
pub facet: Option<Facet>, // FACET clause
289306
pub coord: Option<Coord>, // COORD clause
290307
pub labels: Option<Labels>, // LABEL clause
291-
pub guides: Vec<Guide>, // GUIDE clauses
292308
pub theme: Option<Theme>, // THEME clause
293309
}
294310

@@ -323,19 +339,14 @@ pub enum Geom {
323339

324340
pub enum AestheticValue {
325341
Column(String), // Unquoted column reference: revenue AS x
326-
Literal(LiteralValue), // Quoted literal: 'value' AS fill
327-
}
328-
329-
pub enum LiteralValue {
330-
String(String),
331-
Number(f64),
332-
Boolean(bool),
342+
Literal(ParameterValue), // Quoted literal: 'value' AS fill
333343
}
334344

335345
pub enum ParameterValue {
336346
String(String),
337347
Number(f64),
338348
Boolean(bool),
349+
Array(Vec<ParameterValue>), // Array values for properties
339350
}
340351

341352
pub struct Scale {
@@ -395,19 +406,6 @@ pub struct Labels {
395406
pub labels: HashMap<String, String>, // label type → text
396407
}
397408

398-
pub struct Guide {
399-
pub aesthetic: String,
400-
pub guide_type: Option<GuideType>,
401-
pub properties: HashMap<String, ParameterValue>,
402-
}
403-
404-
pub enum GuideType {
405-
Legend,
406-
ColorBar,
407-
Axis,
408-
None,
409-
}
410-
411409
pub struct Theme {
412410
pub style: Option<String>,
413411
pub properties: HashMap<String, ParameterValue>,
@@ -781,7 +779,7 @@ SELECT * FROM (VALUES
781779
SELECT * FROM sales
782780
VISUALISE
783781
DRAW line MAPPING date AS x, revenue AS y, region AS color
784-
SCALE x SETTING type => 'date'
782+
SCALE x VIA date
785783
LABEL title => 'Sales Trends'
786784
```
787785

@@ -1093,16 +1091,15 @@ Where `<global_mapping>` can be:
10931091

10941092
### Clause Types
10951093

1096-
| Clause | Repeatable | Purpose | Example |
1097-
| ----------- | ---------- | ------------------ | ----------------------------------------- |
1098-
| `VISUALISE` | ✅ Yes | Entry point | `VISUALISE date AS x, revenue AS y` |
1099-
| `DRAW` | ✅ Yes | Define layers | `DRAW line MAPPING date AS x, value AS y` |
1100-
| `SCALE` | ✅ Yes | Configure scales | `SCALE x SETTING type => 'date'` |
1101-
| `FACET` | ❌ No | Small multiples | `FACET WRAP region` |
1102-
| `COORD` | ❌ No | Coordinate system | `COORD cartesian SETTING xlim => [0,100]` |
1103-
| `LABEL` | ❌ No | Text labels | `LABEL title => 'My Chart', x => 'Date'` |
1104-
| `GUIDE` | ✅ Yes | Legend/axis config | `GUIDE color SETTING position => 'right'` |
1105-
| `THEME` | ❌ No | Visual styling | `THEME minimal` |
1094+
| Clause | Repeatable | Purpose | Example |
1095+
| -------------- | ---------- | ------------------ | ------------------------------------ |
1096+
| `VISUALISE` | ✅ Yes | Entry point | `VISUALISE date AS x, revenue AS y` |
1097+
| `DRAW` | ✅ Yes | Define layers | `DRAW line MAPPING date AS x, value AS y` |
1098+
| `SCALE` | ✅ Yes | Configure scales | `SCALE x VIA date` |
1099+
| `FACET` | ❌ No | Small multiples | `FACET WRAP region` |
1100+
| `COORD` | ❌ No | Coordinate system | `COORD cartesian SETTING xlim => [0,100]` |
1101+
| `LABEL` | ❌ No | Text labels | `LABEL title => 'My Chart', x => 'Date'` |
1102+
| `THEME` | ❌ No | Visual styling | `THEME minimal` |
11061103

11071104
### DRAW Clause (Layers)
11081105

@@ -1214,49 +1211,79 @@ DRAW line
12141211
**Syntax**:
12151212

12161213
```sql
1217-
SCALE <aesthetic> SETTING
1218-
[type => <scale_type>]
1219-
[limits => [min, max]]
1220-
[breaks => <array | interval>]
1221-
[palette => <name>]
1222-
[domain => [values...]]
1214+
SCALE [TYPE] <aesthetic> [FROM <input>] [TO <output>] [VIA <transform>] [SETTING <properties>]
12231215
```
12241216

1225-
**Scale Types**:
1217+
**Type Modifiers** (optional, placed before aesthetic):
12261218

1227-
- **Continuous**: `linear`, `log10`, `log`, `log2`, `sqrt`, `reverse`
1228-
- **Discrete**: `categorical`, `ordinal`
1229-
- **Temporal**: `date`, `datetime`, `time`
1230-
- **Color Palettes**: `viridis`, `plasma`, `magma`, `inferno`, `cividis`, `diverging`, `sequential`
1219+
- **`CONTINUOUS`** - Continuous numeric data
1220+
- **`DISCRETE`** - Categorical/discrete data
1221+
- **`BINNED`** - Binned/bucketed data
1222+
- **`DATE`** - Date data (maps to Vega-Lite temporal type)
1223+
- **`DATETIME`** - Datetime data (maps to Vega-Lite temporal type)
1224+
1225+
**Subclauses**:
1226+
1227+
- **`FROM [...]`** - Input range specification (maps to Vega-Lite `scale.domain`)
1228+
- **`TO [...]`** or **`TO palette`** - Output range as array or named palette (maps to Vega-Lite `scale.range` or `scale.scheme`)
1229+
- **`VIA transform`** - Transformation method (reserved for future use)
1230+
- **`SETTING ...`** - Additional properties (e.g., `breaks`)
1231+
1232+
**Named Palettes** (used with `TO`):
1233+
1234+
- `viridis`, `plasma`, `magma`, `inferno`, `cividis`, `diverging`, `sequential`
12311235

12321236
**Critical for Date Formatting**:
12331237

12341238
```sql
1235-
SCALE x SETTING type => 'date'
1239+
SCALE x VIA date
12361240
-- Maps to Vega-Lite field type = "temporal"
12371241
-- Enables proper date axis formatting
12381242
```
12391243

1240-
**Domain Property**:
1244+
**Input Range Specification** (FROM clause):
12411245

1242-
The `domain` property explicitly sets the input domain for a scale:
1246+
The `FROM` clause explicitly sets the input range for a scale:
12431247

12441248
```sql
1245-
-- Set domain for discrete scale
1246-
SCALE color SETTING domain => ['red', 'green', 'blue']
1249+
-- Set range for discrete scale
1250+
SCALE DISCRETE color FROM ['A', 'B', 'C']
12471251

1248-
-- Set domain for continuous scale
1249-
SCALE x SETTING domain => [0, 100]
1252+
-- Set range for continuous scale
1253+
SCALE CONTINUOUS x FROM [0, 100]
12501254
```
12511255

1252-
**Note**: Cannot specify domain in both SCALE and COORD for the same aesthetic (will error).
1256+
**Range Specification** (TO clause):
12531257

1254-
**Example**:
1258+
The `TO` clause sets the output range - either explicit values or a named palette:
1259+
1260+
```sql
1261+
-- Explicit color values
1262+
SCALE color FROM ['A', 'B'] TO ['red', 'blue']
1263+
1264+
-- Named palette
1265+
SCALE color TO viridis
1266+
```
1267+
1268+
**Note**: Cannot specify range in both SCALE and COORD for the same aesthetic (will error).
1269+
1270+
**Examples**:
12551271

12561272
```sql
1257-
SCALE x SETTING type => 'date', breaks => '2 months'
1258-
SCALE y SETTING type => 'log10', limits => [1, 1000]
1259-
SCALE color SETTING palette => 'viridis', domain => ['A', 'B', 'C']
1273+
-- Date scale
1274+
SCALE x VIA date
1275+
1276+
-- Continuous scale with input range
1277+
SCALE CONTINUOUS y FROM [0, 100]
1278+
1279+
-- Discrete color scale with input range and output range
1280+
SCALE DISCRETE color FROM ['A', 'B', 'C'] TO ['red', 'green', 'blue']
1281+
1282+
-- Color scale with named palette
1283+
SCALE color TO viridis
1284+
1285+
-- Scale with input range and additional settings
1286+
SCALE x VIA date FROM ['2024-01-01', '2024-12-31'] SETTING breaks => '1 month'
12601287
```
12611288

12621289
### FACET Clause
@@ -1313,22 +1340,22 @@ COORD SETTING <properties>
13131340

13141341
- `xlim => [min, max]` - Set x-axis limits
13151342
- `ylim => [min, max]` - Set y-axis limits
1316-
- `<aesthetic> => [values...]` - Set domain for any aesthetic (color, fill, size, etc.)
1343+
- `<aesthetic> => [values...]` - Set range for any aesthetic (color, fill, size, etc.)
13171344

13181345
**Flip**:
13191346

1320-
- `<aesthetic> => [values...]` - Set domain for any aesthetic
1347+
- `<aesthetic> => [values...]` - Set range for any aesthetic
13211348

13221349
**Polar**:
13231350

13241351
- `theta => <aesthetic>` - Which aesthetic maps to angle (defaults to `y`)
1325-
- `<aesthetic> => [values...]` - Set domain for any aesthetic
1352+
- `<aesthetic> => [values...]` - Set range for any aesthetic
13261353

13271354
**Important Notes**:
13281355

13291356
1. **Axis limits auto-swap**: `xlim => [100, 0]` automatically becomes `[0, 100]`
13301357
2. **ggplot2 compatibility**: `coord_flip` preserves axis label names (labels stay with aesthetic names, not visual position)
1331-
3. **Domain conflicts**: Error if same aesthetic has domain in both SCALE and COORD
1358+
3. **Range conflicts**: Error if same aesthetic has input range in both SCALE and COORD
13321359
4. **Multi-layer support**: All coordinate transforms apply to all layers
13331360

13341361
**Status**:
@@ -1344,7 +1371,7 @@ COORD SETTING <properties>
13441371
-- Cartesian with axis limits
13451372
COORD cartesian SETTING xlim => [0, 100], ylim => [0, 50]
13461373

1347-
-- Cartesian with aesthetic domain
1374+
-- Cartesian with aesthetic range
13481375
COORD cartesian SETTING color => O ['red', 'green', 'blue']
13491376

13501377
-- Cartesian shorthand (type optional when using SETTING)
@@ -1353,7 +1380,7 @@ COORD SETTING xlim => [0, 100]
13531380
-- Flip coordinates for horizontal bar chart
13541381
COORD flip
13551382

1356-
-- Flip with aesthetic domain
1383+
-- Flip with aesthetic range
13571384
COORD flip SETTING color => ['A', 'B', 'C']
13581385

13591386
-- Polar for pie chart (theta defaults to y)
@@ -1427,7 +1454,7 @@ DRAW line
14271454
MAPPING sale_date AS x, total AS y, region AS color
14281455
DRAW point
14291456
MAPPING sale_date AS x, total AS y, region AS color
1430-
SCALE x SETTING type => 'date'
1457+
SCALE x VIA date
14311458
FACET WRAP region
14321459
LABEL title => 'Sales Trends by Region', x => 'Date', y => 'Total Quantity'
14331460
THEME minimal

Cargo.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,9 @@ pyo3 = "0.26"
5757
# Testing
5858
proptest = "1.4"
5959

60+
# Color interpolation
61+
palette = "0.7"
62+
6063
# Utilities
6164
regex = "1.10"
6265
chrono = "0.4"

0 commit comments

Comments
 (0)