Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 18 additions & 21 deletions .claude/skills/slayer-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,25 +23,21 @@ dimensions:
sql: "created_at"
type: time

default_time_dimension: created_at # Optional: used by time-dependent formulas when no time_dimensions in query
default_time_dimension: created_at # Optional: used by time-dependent formulas

measures:
- name: count
type: count # COUNT(*), no sql needed
- name: revenue_sum
sql: "amount"
type: sum
- name: revenue_avg
sql: "amount"
type: avg
- name: revenue
sql: "amount" # Row-level expression — aggregation chosen at query time
- name: quantity
sql: "quantity"
```

Measures are **row-level expressions** — no aggregation type in the definition. Aggregation is specified at query time with colon syntax: `"revenue:sum"`, `"revenue:avg"`, `"*:count"`.

## Data Types

**Dimension types**: `string`, `number`, `boolean`, `time` (timestamp), `date`

**Measure aggregation types**: `count`, `count_distinct`, `sum`, `avg`, `min`, `max`, `last` (most recent time bucket's value — for snapshot metrics like balances)

## Joins

Models can declare LEFT JOIN relationships to other models:
Expand All @@ -52,7 +48,7 @@ joins:
join_pairs: [["customer_id", "id"]]
```

Enables cross-model measures (`customers.avg_score`), multi-hop dimensions (`customers.regions.name`), and transforms on joined measures (`cumsum(customers.avg_score)`). Auto-generated from FKs during ingestion. Joins are auto-resolved transitively by walking the join graph. Diamond joins (same table via different paths) are supported — each path gets a unique `__`-delimited alias (e.g., `customers__regions` vs `warehouses__regions`).
Enables cross-model measures (`customers.score:avg`), multi-hop dimensions (`customers.regions.name`), and transforms on joined measures (`cumsum(customers.score:avg)`). Auto-generated from FKs during ingestion. Joins are auto-resolved transitively by walking the join graph. Diamond joins (same table via different paths) are supported — each path gets a unique `__`-delimited alias (e.g., `customers__regions` vs `warehouses__regions`).

## Model Filters

Expand All @@ -66,6 +62,7 @@ Models can have always-applied WHERE filters: `filters: ["deleted_at IS NULL"]`.

- Use **bare column names** (e.g., `"amount"`) in dimension/measure SQL — SLayer qualifies them automatically
- For complex expressions, use the model name as table prefix (e.g., `"orders.amount * orders.quantity"`)

## Datasource Config

```yaml
Expand All @@ -80,9 +77,9 @@ password: ${DB_PASSWORD}

`${VAR}` references are resolved from environment variables at read time.

## Auto-Ingestion with Rollup Joins
## Auto-Ingestion

Connect to a DB and generate denormalized models automatically:
Connect to a DB and generate models automatically:

```python
from slayer.engine.ingestion import ingest_datasource
Expand All @@ -91,22 +88,22 @@ models = ingest_datasource(datasource=ds, schema="public")

Generates:
- Dimensions for all columns
- `count` measure; numeric non-ID cols get `_sum`, `_avg`, `_min`, `_max`, `_distinct`; non-numeric non-ID cols get `_distinct`, `_count`
- One measure per non-ID column (e.g., `{name: "amount", sql: "amount"}`) — aggregation chosen at query time
- `*:count` is always available without a measure definition
- **Dynamic joins**: detects FK relationships, creates models with explicit join metadata (LEFT JOINs built at query time)
- Joined dimensions use full-path dotted naming (`customers.name`, `customers.regions.name`)
- FK columns are excluded; ID-like columns (`*_id`, `*_key`) skip sum/avg measures
- Count-distinct measures for each referenced table's PK (`customers.count`)
- FK columns are excluded; ID-like columns (`*_id`, `*_key`) are dimensions only

## MCP Incremental Editing

Via MCP, agents can edit models incrementally:
- `update_model(model_name="orders", description="Core orders table")` — update metadata without replacing the full definition
- `add_measures(model_name="orders", measures=[{"name": "total", "sql": "amount", "type": "sum"}])`
- `update_model(model_name="orders", description="Core orders table")`
- `add_measures(model_name="orders", measures=[{"name": "margin", "sql": "amount - cost"}])`
- `add_dimensions(model_name="orders", dimensions=[{"name": "region", "sql": "region", "type": "string"}])`
- `delete_measures_dimensions(model_name="orders", names=["total"])`
- `delete_measures_dimensions(model_name="orders", names=["margin"])`

## Storage Backends

- `YAMLStorage(base_dir="./data")` — models as YAML files in `data/models/`, datasources in `data/datasources/`
- `SQLiteStorage(db_path="./slayer.db")` — everything in a single SQLite file
- Both implement `StorageBackend` protocol: `save_model()`, `get_model()`, `list_models()`, `delete_model()`, same for datasources
- Use `resolve_storage("path")` factory for auto-detection (directory → YAML, .db → SQLite, URI schemes for custom backends)
127 changes: 47 additions & 80 deletions .claude/skills/slayer-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,109 +7,78 @@ description: How to construct and execute SLayer queries. Use when building quer
## SlayerQuery Structure

```python
from slayer.core.query import SlayerQuery, ColumnRef, TimeDimension, OrderItem
from slayer.core.query import SlayerQuery

query = SlayerQuery(
source_model="orders", # Source model name
fields=[{"formula": "count"}, {"formula": "revenue"}],
dimensions=[ColumnRef(name="status")],
time_dimensions=[
TimeDimension(
dimension=ColumnRef(name="created_at"),
granularity=TimeGranularity.MONTH, # Required
date_range=["2024-01-01", "2024-12-31"], # Optional
)
],
filters=[
"status == 'active'",
],
order=[OrderItem(column=ColumnRef(name="count", model="orders"), direction="desc")],
source_model="orders",
fields=["*:count", "revenue:sum"],
dimensions=["status"],
time_dimensions=[{"dimension": "created_at", "granularity": "month"}],
filters=["status = 'active'"],
order=[{"column": "*:count", "direction": "desc"}],
limit=10,
offset=0,

whole_periods_only=False, # Optional: snap date filters to time bucket boundaries
)
```

## ColumnRef
## Fields — Measures with Colon Aggregation

- `ColumnRef(name="status")` — column in the query's model
- `ColumnRef(name="status", label="Order Status")` — with optional human-readable label
- `ColumnRef.from_string("orders.status")` — parse from dotted string

## Filters

Filters are simple formula strings passed as `List[str]`:
Measures are row-level expressions; aggregation is chosen at query time with **colon syntax**:

```python
filters=[
"status == 'active'",
"amount > 100",
"status == 'completed' or status == 'pending'",
fields=[
"*:count", # COUNT(*)
"revenue:sum", # SUM(revenue)
"revenue:avg", # AVG(revenue)
"price:weighted_avg(weight=quantity)", # weighted average
{"formula": "revenue:sum / *:count", "name": "aov", "label": "Average Order Value"},
"cumsum(revenue:sum)", # running total
"change_pct(revenue:sum)", # month-over-month growth
"last(revenue:sum)", # most recent period's value
"time_shift(revenue:sum, -1, 'year')", # year-over-year
"lag(revenue:sum, 1)", # previous row (window function)
"rank(revenue:sum)", # ranking
]
```

**Operators**: `=`, `<>`, `>`, `>=`, `<`, `<=`, `IN`, `IS NULL`, `IS NOT NULL`

**Boolean logic**: `and`, `or`, `not` within a single string
Built-in aggregations: `sum`, `avg`, `min`, `max`, `count`, `count_distinct`, `first`, `last`, `weighted_avg`, `median`, `percentile`.

**Pattern matching**: `like` and `not like` operators (e.g., `"name like '%acme%'"`, `"name not like '%test%'"`). Filters on measures are automatically routed to HAVING.
`*:count` is always available — no measure definition needed. `col:count` counts non-nulls.

**Filtering on computed columns**: filters can reference field names from `fields` (e.g., `"rev_change < 0"`) or contain inline transform expressions (e.g., `"last(change(revenue)) < 0"`). These are applied as post-filters on the outer query.
Result column naming: `revenue:sum` → `orders.revenue_sum` (colon becomes underscore). `*:count` → `orders.count`.

## Executing
## Filters

```python
# Via engine directly
engine = SlayerQueryEngine(storage=storage)
result = engine.execute(query=query) # SlayerResponse with .data, .columns, .row_count, .sql
filters=[
"status = 'active'",
"amount > 100",
"status = 'completed' OR status = 'pending'",
]
```

# Via client (remote)
client = SlayerClient(url="http://localhost:5143")
df = client.query_df(query)
**Operators**: `=`, `<>`, `>`, `>=`, `<`, `<=`, `IN`, `IS NULL`, `IS NOT NULL`, `LIKE`, `NOT LIKE`

# Via client (local, no server)
client = SlayerClient(storage=YAMLStorage(base_dir="./models"))
data = client.query(query)
```
**Boolean logic**: `AND`, `OR`, `NOT`

## Fields
**Filtering on computed columns**: `"change(revenue:sum) > 0"`, `"last(change(revenue:sum)) < 0"`. Applied as post-filters on the outer query.

The `fields` parameter specifies what data columns to return. Each field has a `formula` (required), optional `name`, and optional `label` (human-readable display name). Formulas are parsed by `slayer/core/formula.py`.
## Executing

```python
query = SlayerQuery(
source_model="orders",
time_dimensions=[TimeDimension(dimension=ColumnRef(name="created_at"), granularity=TimeGranularity.MONTH)],
fields=[
{"formula": "count"},
{"formula": "revenue_sum"},
{"formula": "revenue_sum / count", "name": "aov", "label": "Average Order Value"},
{"formula": "cumsum(revenue_sum)"},
{"formula": "change_pct(revenue_sum)"},
{"formula": "last(revenue_sum)", "name": "latest_rev"},
{"formula": "time_shift(revenue_sum, -1, 'year')", "name": "rev_last_year"},
{"formula": "time_shift(revenue_sum, -2)", "name": "rev_2_ago"},
{"formula": "lag(revenue_sum, 1)", "name": "rev_prev_row"},
{"formula": "rank(revenue_sum)"},
],
)
engine = SlayerQueryEngine(storage=storage)
result = engine.execute(query=query) # SlayerResponse with .data, .columns, .row_count, .sql, .meta
```

Available formula functions: `cumsum`, `time_shift`, `change`, `change_pct`, `rank`, `last`, `lag`, `lead`. `time_shift` always uses a self-join CTE — no edge NULLs, handles data gaps correctly. `lag(x, n)` / `lead(x, n)` use SQL window functions directly (more efficient, but NULLs at edges).

Time dimension resolution: single `time_dimensions` entry is used automatically. With 2+, `main_time_dimension` disambiguates (or model's `default_time_dimension` if among query's time dims). With none, falls back to model default.

## Cross-Model Measures

Reference measures from joined models with dotted syntax (auto-resolved via join graph):
Reference measures from joined models with dotted syntax + colon aggregation:

```python
fields=[
{"formula": "count"},
{"formula": "customers.avg_score"}, # single hop
{"formula": "cumsum(customers.avg_score)"}, # transforms work too
{"formula": "customers.regions.population_sum"}, # multi-hop
"*:count",
"customers.score:avg", # single hop
"cumsum(customers.score:avg)", # transforms work too
"customers.regions.population:sum", # multi-hop
]
```

Expand All @@ -123,8 +92,8 @@ query = SlayerQuery(
source_name="orders",
dimensions=[{"name": "tier", "sql": "CASE WHEN amount > 100 THEN 'high' ELSE 'low' END"}],
),
dimensions=[ColumnRef(name="tier")],
fields=[...],
dimensions=["tier"],
fields=["*:count"],
)
```

Expand All @@ -133,13 +102,11 @@ query = SlayerQuery(
Pass a list of queries — earlier queries are named sub-queries, last is the main:

```python
inner = SlayerQuery(name="monthly", source_model="orders", fields=[...], time_dimensions=[...])
outer = SlayerQuery(source_model="monthly", fields=[{"formula": "count"}])
inner = SlayerQuery(name="monthly", source_model="orders", fields=["*:count", "revenue:sum"], time_dimensions=[...])
outer = SlayerQuery(source_model="monthly", fields=["*:count"])
engine.execute(query=[inner, outer])
```

Or save a query as a permanent model with `create_model_from_query`.

## Result Format

Column keys use `model_name.column_name` format: `"orders.count"`, `"orders.status"`. For multi-hop joined dimensions, the full path is included: `"orders.customers.regions.name"`. Response includes `meta` dict mapping column aliases to `FieldMetadata` objects (currently has `label` field).
Column keys use `model_name.column_name` format: `"orders.count"`, `"orders.revenue_sum"`. For multi-hop joined dimensions, the full path is included: `"orders.customers.regions.name"`. Response includes `meta` dict mapping column aliases to `FieldMetadata` objects (currently has `label` field).
Loading
Loading