Skip to content

sidequery/yardstick

Repository files navigation

Yardstick

An experimental DuckDB extension implementing Julian Hyde's "Measures in SQL" paper (arXiv:2406.00251).

What is this?

Yardstick adds measure-aware SQL to DuckDB. Measures are aggregations that know how to re-aggregate themselves when the query context changes. This enables:

  • Percent of total calculations without CTEs or window functions
  • Year-over-year comparisons with simple syntax
  • Drill-down analytics that automatically adjust aggregation context

Quick Start & Demo

-- Load the extension
INSTALL yardstick FROM community;
LOAD yardstick;

-- Create the sales table
CREATE TABLE sales (
    id INTEGER PRIMARY KEY,
    year INTEGER,
    region VARCHAR(50),
    amount DECIMAL(10, 2)
);

-- Insert sample data
INSERT INTO sales (id, year, region, amount) VALUES
    (1, 2023, 'North', 15000.00),
    (2, 2023, 'North', 22000.00),
    (3, 2023, 'South', 18000.00),
    (4, 2023, 'South', 12000.00),
    (5, 2023, 'East', 25000.00),
    (6, 2023, 'West', 19000.00),
    (7, 2024, 'North', 28000.00),
    (8, 2024, 'North', 31000.00),
    (9, 2024, 'South', 21000.00),
    (10, 2024, 'South', 16000.00),
    (11, 2024, 'East', 33000.00),
    (12, 2024, 'East', 29000.00),
    (13, 2024, 'West', 24000.00),
    (14, 2024, 'West', 27000.00);

-- Create a view with measures
CREATE VIEW sales_v AS
SELECT
    year,
    region,
    SUM(amount) AS MEASURE revenue,
    COUNT(*) AS MEASURE order_count
FROM sales
GROUP BY year, region;

-- Query with AGGREGATE() and AT modifiers (SEMANTIC prefix required)
SEMANTIC SELECT
    year,
    region,
    AGGREGATE(revenue) AS revenue,
    AGGREGATE(revenue) AT (ALL region) AS year_total,
    AGGREGATE(revenue) / AGGREGATE(revenue) AT (ALL region) AS pct_of_year
FROM sales_v
GROUP BY year, region;

-- Variance from the global average
SEMANTIC SELECT
    region,
    AGGREGATE(revenue) AS revenue,
    AGGREGATE(revenue) AT (ALL) / 4.0 AS expected_if_equal,  -- 4 regions
    AGGREGATE(revenue) - (AGGREGATE(revenue) AT (ALL) / 4.0) AS variance
FROM sales_v
GROUP BY region;

-- Nested percentages (% of year, and that year's % of total)
SEMANTIC SELECT
    year,
    region,
    AGGREGATE(revenue) AS revenue,
    100.0 * AGGREGATE(revenue) / AGGREGATE(revenue) AT (ALL region) AS pct_of_year,
    100.0 * AGGREGATE(revenue) AT (ALL region) / AGGREGATE(revenue) AT (ALL) AS year_pct_of_total
FROM sales_v
GROUP BY year, region;

-- Compare 2024 performance to 2023 baseline for each region
SEMANTIC SELECT
    region,
    AGGREGATE(revenue) AT (SET year = 2024) AS rev_2024,
    AGGREGATE(revenue) AT (SET year = 2023) AS rev_2023,
    AGGREGATE(revenue) AT (SET year = 2024) - AGGREGATE(revenue) AT (SET year = 2023) AS growth
FROM sales_v
GROUP BY region;

-- Filter to specific segments
SEMANTIC SELECT
    year,
    AGGREGATE(revenue) AS total_revenue,
    AGGREGATE(revenue) AT (SET region = 'North') AS north_revenue,
    AGGREGATE(revenue) AT (SET region IN ('North', 'South')) AS north_south_combined
FROM sales_v
GROUP BY year;

Syntax

Defining Measures

CREATE VIEW view_name AS
SELECT
    dimension1,
    dimension2,
    AGG(expr) AS MEASURE measure_name
FROM table
GROUP BY dimension1, dimension2;

Supported aggregations: SUM, COUNT, AVG, MIN, MAX

Querying Measures

Queries using AGGREGATE() must use the SEMANTIC prefix:

SEMANTIC SELECT
    dimensions,
    AGGREGATE(measure_name) [AT modifier]
FROM view_name
GROUP BY dimensions;

AT Modifiers

Modifier Description Example
AT (ALL) Grand total across all dimensions AGGREGATE(revenue) AT (ALL)
AT (ALL dim) Total excluding specific dimension AGGREGATE(revenue) AT (ALL region)
AT (ALL expr) Total excluding ad hoc dimension AGGREGATE(revenue) AT (ALL MONTH(date))
AT (SET dim = val) Fix dimension to specific value AGGREGATE(revenue) AT (SET year = 2022)
AT (SET dim = expr) Fix dimension to expression AGGREGATE(revenue) AT (SET year = year - 1)
AT (SET expr = val) Fix ad hoc dimension to value AGGREGATE(revenue) AT (SET MONTH(date) = 6)
AT (WHERE cond) Pre-aggregation filter AGGREGATE(revenue) AT (WHERE region = 'US')
AT (VISIBLE) Use query's WHERE clause AGGREGATE(revenue) AT (VISIBLE)

Building

Prerequisites:

  • CMake 3.5+
  • C++17 compiler
  • Cargo
make        # builds Rust library and DuckDB extension
make test   # runs tests

The extension will be at build/release/extension/yardstick/yardstick.duckdb_extension

Limitations

See LIMITATIONS.md for known issues and workarounds.

Key limitations:

  • Window function measures not supported

References

License

MIT