Search before asking
Motivation
When performing complex aggregation queries, standard SQL requires users to explicitly repeat all non-aggregated columns from the SELECT list inside the GROUP BY clause. In real-world development and data analysis scenarios, this introduces several pain points:
- Tedious and Error-Prone Queries: When grouping by multiple dimensional attributes or tags (e.g.,
region, plant_id, device_model, status), users have to write these fields twice. If a user updates the SELECT list during iteration but forgets to update the GROUP BY list, it results in a syntax error.
- High Friction in Exploratory Data Analysis (EDA): Data analysts frequently add or remove dimension columns to observe metrics at different granularities. Synchronizing the
SELECT and GROUP BY clauses for every minor adjustment breaks the analytical flow and reduces efficiency.
- Alignment with Modern Analytical Databases: Mainstream modern OLAP and analytical databases (such as DuckDB, ClickHouse, etc.) already support the
GROUP BY ALL syntax. It automatically infers the grouping columns, significantly simplifying SQL writing.
With the introduction of IoTDB's Table Model, the relational semantics are highly geared towards data analysts. Introducing GROUP BY ALL will further elevate the ease of use and the overall developer experience within the Table Model.
Solution
We propose introducing the GROUP BY ALL syntax into the IoTDB SQL dialect.
Note: To maintain architectural clarity and backward compatibility, this feature will ONLY be implemented within the Table Model. The existing Tree Model requires no changes.
The implementation will primarily focus on the frontend parsing and logical planning phases of the Table Model:
- Update Antlr4 Grammar (Parser): Modify the table-model-specific grammar file (
iotdb-core/relational-grammar/src/main/antlr4/org/apache/iotdb/db/relational/grammar/sql/RelationalSql.g4) to support the ALL keyword within the GROUP BY clause rules.
- Rewrite Analyzer Logic (Analyzer / Logical Planner): Intercept the
GROUP BY ALL syntax during the semantic analysis phase. If detected, the analyzer should iterate through the current SELECT clause:
- Automatically extract all scalar expressions or column references that do not contain aggregate functions.
- Implicitly convert these extracted expressions into Grouping Keys and inject them into the subsequent Logical Plan.
- Execution Engine Transparency: This feature is purely syntactic sugar. After the AST transformation by the Analyzer, the underlying logical plan and physical operators generated will be identical to a query where all columns are manually specified. Therefore, the physical execution layer and memory control require zero modifications.
- Edge Case Handling:
- If the
SELECT clause consists entirely of aggregate functions (e.g., SELECT COUNT(s1), SUM(s2) FROM table GROUP BY ALL), the analyzer should recognize that there are no non-aggregated columns and equivalently transform it into a global aggregation (a query without a GROUP BY clause).
- Ensure seamless compatibility with other clauses, such as
ORDER BY.
Example Usage:
USE plant_database;
-- Current Syntax (Standard, but verbose)
SELECT
region,
plant_id,
device_id,
AVG(temperature) AS avg_temp,
MAX(vibration) AS max_vib
FROM sensor_data
WHERE time >= 10
GROUP BY region, plant_id, device_id;
-- Proposed Syntax (Using GROUP BY ALL)
SELECT
region,
plant_id,
device_id,
AVG(temperature) AS avg_temp,
MAX(vibration) AS max_vib
FROM sensor_data
WHERE time >= 10
GROUP BY ALL;
Alternatives
No response
Are you willing to submit a PR?
Search before asking
Motivation
When performing complex aggregation queries, standard SQL requires users to explicitly repeat all non-aggregated columns from the
SELECTlist inside theGROUP BYclause. In real-world development and data analysis scenarios, this introduces several pain points:region,plant_id,device_model,status), users have to write these fields twice. If a user updates theSELECTlist during iteration but forgets to update theGROUP BYlist, it results in a syntax error.SELECTandGROUP BYclauses for every minor adjustment breaks the analytical flow and reduces efficiency.GROUP BY ALLsyntax. It automatically infers the grouping columns, significantly simplifying SQL writing.With the introduction of IoTDB's Table Model, the relational semantics are highly geared towards data analysts. Introducing
GROUP BY ALLwill further elevate the ease of use and the overall developer experience within the Table Model.Solution
We propose introducing the
GROUP BY ALLsyntax into the IoTDB SQL dialect.Note: To maintain architectural clarity and backward compatibility, this feature will ONLY be implemented within the Table Model. The existing Tree Model requires no changes.
The implementation will primarily focus on the frontend parsing and logical planning phases of the Table Model:
iotdb-core/relational-grammar/src/main/antlr4/org/apache/iotdb/db/relational/grammar/sql/RelationalSql.g4) to support theALLkeyword within theGROUP BYclause rules.GROUP BY ALLsyntax during the semantic analysis phase. If detected, the analyzer should iterate through the currentSELECTclause:SELECTclause consists entirely of aggregate functions (e.g.,SELECT COUNT(s1), SUM(s2) FROM table GROUP BY ALL), the analyzer should recognize that there are no non-aggregated columns and equivalently transform it into a global aggregation (a query without aGROUP BYclause).ORDER BY.Example Usage:
Alternatives
No response
Are you willing to submit a PR?