-
Notifications
You must be signed in to change notification settings - Fork 3
[Refactor] Add basic formatter #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@HazelYuAhiru Because the test case assumes your output will match its input, I want to make sure our AST matches. The changes I made are:
|
|
I'm still reviewing the main formatter functions, but I was wondering that should we separate the formatter and parser code into different files with their own tests? Like having |
HazelYuAhiru
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a basic query formatter that converts custom AST (Abstract Syntax Tree) nodes into JSON format compatible with the mosql library, which then generates SQL text. The implementation provides a two-step conversion pipeline: AST → JSON → SQL.
Key Changes:
- Added
QueryFormatterclass withformat()method that orchestrates the AST-to-SQL conversion - Implemented clause-specific formatters (
format_select,format_from,format_where, etc.) to handle different SQL clauses - Created comprehensive test case
test_basic_format()that validates formatting of a complex query with JOIN, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, and OFFSET clauses
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| core/query_formatter.py | New formatter implementation with AST-to-JSON conversion functions and support for all major SQL clauses including JOIN operations |
| tests/test_query_formatter.py | New test file with test_basic_format() validating the formatter against a comprehensive multi-clause SQL query, plus normalize_sql() helper for SQL comparison |
Comments suppressed due to low confidence (3)
core/query_formatter.py:8
- Import of 'ColumnNode' is not used.
Import of 'LiteralNode' is not used.
Import of 'OperatorNode' is not used.
Import of 'FunctionNode' is not used.
Import of 'LimitNode' is not used.
Import of 'OffsetNode' is not used.
Import of 'SubqueryNode' is not used.
Import of 'VarNode' is not used.
Import of 'VarSetNode' is not used.
from core.ast.node import (
QueryNode, SelectNode, FromNode, WhereNode, TableNode, ColumnNode,
LiteralNode, OperatorNode, FunctionNode, GroupByNode, HavingNode,
OrderByNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode,
JoinNode
)
tests/test_query_formatter.py:7
- Import of 'SubqueryNode' is not used.
Import of 'VarNode' is not used.
Import of 'VarSetNode' is not used.
from core.ast.node import (
OrderByItemNode, QueryNode, SelectNode, FromNode, WhereNode, TableNode, ColumnNode,
LiteralNode, OperatorNode, FunctionNode, GroupByNode, HavingNode,
OrderByNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode
)
tests/test_query_formatter.py:9
- Import of 'get_query' is not used.
from data.queries import get_query
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from core.ast.node import ( | ||
| OrderByItemNode, QueryNode, SelectNode, FromNode, WhereNode, TableNode, ColumnNode, | ||
| LiteralNode, OperatorNode, FunctionNode, GroupByNode, HavingNode, | ||
| OrderByNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused imports: Several node types are imported but never used in this test file: SubqueryNode, VarNode, VarSetNode. Consider removing unused imports.
| OrderByNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode | |
| OrderByNode, LimitNode, OffsetNode, JoinNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as the above. We will use them once we have a test case with a subquery.
| from core.ast.node import ( | ||
| QueryNode, SelectNode, FromNode, WhereNode, TableNode, ColumnNode, | ||
| LiteralNode, OperatorNode, FunctionNode, GroupByNode, HavingNode, | ||
| OrderByNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused imports: SubqueryNode, VarNode, and VarSetNode are imported but never used in the formatter implementation. Consider removing unused imports.
| OrderByNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, | |
| OrderByNode, LimitNode, OffsetNode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comment as in Yihong's PR. We will support Subquery in future PRs when we introduce a complex test case with a subquery.
| def test_basic_format(): | ||
| # Construct expected AST | ||
| # Tables | ||
| emp_table = TableNode("employees", "e") | ||
| dept_table = TableNode("departments", "d") | ||
| # Columns | ||
| emp_name = ColumnNode("name", _parent_alias="e") | ||
| emp_salary = ColumnNode("salary", _parent_alias="e") | ||
| emp_age = ColumnNode("age", _parent_alias="e") | ||
| emp_dept_id = ColumnNode("department_id", _parent_alias="e") | ||
|
|
||
| dept_name = ColumnNode("name", _alias="dept_name", _parent_alias="d") | ||
| dept_id = ColumnNode("id", _parent_alias="d") | ||
|
|
||
| count_star = FunctionNode("COUNT", _alias="emp_count", _args=[ColumnNode("*")]) | ||
|
|
||
| # SELECT clause | ||
| select_clause = SelectNode([emp_name, dept_name, count_star]) | ||
| # FROM clause with JOIN | ||
| join_condition = OperatorNode(emp_dept_id, "=", dept_id) | ||
| join_node = JoinNode(emp_table, dept_table, JoinType.INNER, join_condition) | ||
| from_clause = FromNode([join_node]) | ||
| # WHERE clause | ||
| salary_condition = OperatorNode(emp_salary, ">", LiteralNode(40000)) | ||
| age_condition = OperatorNode(emp_age, "<", LiteralNode(60)) | ||
| where_condition = OperatorNode(salary_condition, "AND", age_condition) | ||
| where_clause = WhereNode([where_condition]) | ||
| # GROUP BY clause | ||
| group_by_clause = GroupByNode([dept_id, dept_name]) | ||
| # HAVING clause | ||
| having_condition = OperatorNode(count_star, ">", LiteralNode(2)) | ||
| having_clause = HavingNode([having_condition]) | ||
| # ORDER BY clause | ||
| order_by_item1 = OrderByItemNode(dept_name, SortOrder.ASC) | ||
| order_by_item2 = OrderByItemNode(count_star, SortOrder.DESC) | ||
| order_by_clause = OrderByNode([order_by_item1, order_by_item2]) | ||
| # LIMIT and OFFSET | ||
| limit_clause = LimitNode(10) | ||
| offset_clause = OffsetNode(5) | ||
| # Complete query | ||
| ast = QueryNode( | ||
| _select=select_clause, | ||
| _from=from_clause, | ||
| _where=where_clause, | ||
| _group_by=group_by_clause, | ||
| _having=having_clause, | ||
| _order_by=order_by_clause, | ||
| _limit=limit_clause, | ||
| _offset=offset_clause | ||
| ) | ||
|
|
||
| # Construct expected query text | ||
| expected_sql = """ | ||
| SELECT e.name, d.name AS dept_name, COUNT(*) AS emp_count | ||
| FROM employees AS e JOIN departments AS d ON e.department_id = d.id | ||
| WHERE e.salary > 40000 AND e.age < 60 | ||
| GROUP BY d.id, d.name | ||
| HAVING COUNT(*) > 2 | ||
| ORDER BY dept_name, emp_count DESC | ||
| LIMIT 10 OFFSET 5 | ||
| """ | ||
| expected_sql = expected_sql.strip() | ||
| print(mosql.parse(expected_sql)) | ||
| print(ast) | ||
|
|
||
| sql = formatter.format(ast) | ||
| sql = sql.strip() | ||
|
|
||
| assert normalize_sql(sql) == normalize_sql(expected_sql) No newline at end of file |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Limited test coverage: Consider adding test cases for edge cases such as:
- SELECT with no FROM clause
- WHERE with OR conditions
- Different join types (LEFT, RIGHT, FULL)
- Nested subqueries
- Functions with multiple arguments
- NULL comparisons (IS NULL, IS NOT NULL)
While the current test provides good basic coverage, these cases would ensure the formatter handles various SQL patterns correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We will add them in the following PRs.
| elif node.type == NodeType.FUNCTION: | ||
| # format: {'function_name': args} | ||
| func_name = node.name.lower() | ||
| args = [format_expression(arg) for arg in node.children] |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential edge case: The function formatting logic assumes functions have at least one argument. SQL functions with no arguments (e.g., NOW(), CURRENT_TIMESTAMP()) would result in {func_name: []}. Consider handling the zero-argument case explicitly, e.g., return {func_name: args} if args else {func_name: None} or verify mosql's expected format for zero-argument functions.
| args = [format_expression(arg) for arg in node.children] | |
| args = [format_expression(arg) for arg in node.children] | |
| if not args: | |
| return {func_name: None} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We can fix it once we have a test case for it.
| sql = formatter.format(ast) | ||
| sql = sql.strip() | ||
|
|
||
| assert normalize_sql(sql) == normalize_sql(expected_sql) No newline at end of file |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Test could be more comprehensive: The test only validates the final SQL output but doesn't verify the intermediate JSON format produced by ast_to_json(). Consider adding an assertion to check the JSON structure matches mosql's expected format, which would help catch issues in the AST-to-JSON conversion step separately from the JSON-to-SQL step.
| @@ -0,0 +1,265 @@ | |||
| import mo_sql_parsing as mosql | |||
| from core.ast.node import QueryNode | |||
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate import: QueryNode is imported on line 2 and again on line 4. Remove the duplicate import on line 2.
| from core.ast.node import QueryNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, let's remove it.
| LiteralNode, OperatorNode, FunctionNode, GroupByNode, HavingNode, | ||
| OrderByNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode | ||
| ) | ||
| from core.ast.enums import NodeType, JoinType, SortOrder |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'NodeType' is not used.
| from core.ast.enums import NodeType, JoinType, SortOrder | |
| from core.ast.enums import JoinType, SortOrder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Let's remove it.
baiqiushi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @colinthebomb1 for the PR! It looks great overall! Only a few minor changes are needed. Please take care of them. Thanks!
| @@ -0,0 +1,265 @@ | |||
| import mo_sql_parsing as mosql | |||
| from core.ast.node import QueryNode | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, let's remove it.
| from core.ast.node import ( | ||
| QueryNode, SelectNode, FromNode, WhereNode, TableNode, ColumnNode, | ||
| LiteralNode, OperatorNode, FunctionNode, GroupByNode, HavingNode, | ||
| OrderByNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comment as in Yihong's PR. We will support Subquery in future PRs when we introduce a complex test case with a subquery.
| elif node.type == NodeType.FUNCTION: | ||
| # format: {'function_name': args} | ||
| func_name = node.name.lower() | ||
| args = [format_expression(arg) for arg in node.children] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We can fix it once we have a test case for it.
| from core.ast.node import ( | ||
| OrderByItemNode, QueryNode, SelectNode, FromNode, WhereNode, TableNode, ColumnNode, | ||
| LiteralNode, OperatorNode, FunctionNode, GroupByNode, HavingNode, | ||
| OrderByNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as the above. We will use them once we have a test case with a subquery.
| LiteralNode, OperatorNode, FunctionNode, GroupByNode, HavingNode, | ||
| OrderByNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode | ||
| ) | ||
| from core.ast.enums import NodeType, JoinType, SortOrder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Let's remove it.
| OrderByNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode | ||
| ) | ||
| from core.ast.enums import NodeType, JoinType, SortOrder | ||
| from data.queries import get_query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Let's remove it.
| def test_basic_format(): | ||
| # Construct expected AST | ||
| # Tables | ||
| emp_table = TableNode("employees", "e") | ||
| dept_table = TableNode("departments", "d") | ||
| # Columns | ||
| emp_name = ColumnNode("name", _parent_alias="e") | ||
| emp_salary = ColumnNode("salary", _parent_alias="e") | ||
| emp_age = ColumnNode("age", _parent_alias="e") | ||
| emp_dept_id = ColumnNode("department_id", _parent_alias="e") | ||
|
|
||
| dept_name = ColumnNode("name", _alias="dept_name", _parent_alias="d") | ||
| dept_id = ColumnNode("id", _parent_alias="d") | ||
|
|
||
| count_star = FunctionNode("COUNT", _alias="emp_count", _args=[ColumnNode("*")]) | ||
|
|
||
| # SELECT clause | ||
| select_clause = SelectNode([emp_name, dept_name, count_star]) | ||
| # FROM clause with JOIN | ||
| join_condition = OperatorNode(emp_dept_id, "=", dept_id) | ||
| join_node = JoinNode(emp_table, dept_table, JoinType.INNER, join_condition) | ||
| from_clause = FromNode([join_node]) | ||
| # WHERE clause | ||
| salary_condition = OperatorNode(emp_salary, ">", LiteralNode(40000)) | ||
| age_condition = OperatorNode(emp_age, "<", LiteralNode(60)) | ||
| where_condition = OperatorNode(salary_condition, "AND", age_condition) | ||
| where_clause = WhereNode([where_condition]) | ||
| # GROUP BY clause | ||
| group_by_clause = GroupByNode([dept_id, dept_name]) | ||
| # HAVING clause | ||
| having_condition = OperatorNode(count_star, ">", LiteralNode(2)) | ||
| having_clause = HavingNode([having_condition]) | ||
| # ORDER BY clause | ||
| order_by_item1 = OrderByItemNode(dept_name, SortOrder.ASC) | ||
| order_by_item2 = OrderByItemNode(count_star, SortOrder.DESC) | ||
| order_by_clause = OrderByNode([order_by_item1, order_by_item2]) | ||
| # LIMIT and OFFSET | ||
| limit_clause = LimitNode(10) | ||
| offset_clause = OffsetNode(5) | ||
| # Complete query | ||
| ast = QueryNode( | ||
| _select=select_clause, | ||
| _from=from_clause, | ||
| _where=where_clause, | ||
| _group_by=group_by_clause, | ||
| _having=having_clause, | ||
| _order_by=order_by_clause, | ||
| _limit=limit_clause, | ||
| _offset=offset_clause | ||
| ) | ||
|
|
||
| # Construct expected query text | ||
| expected_sql = """ | ||
| SELECT e.name, d.name AS dept_name, COUNT(*) AS emp_count | ||
| FROM employees AS e JOIN departments AS d ON e.department_id = d.id | ||
| WHERE e.salary > 40000 AND e.age < 60 | ||
| GROUP BY d.id, d.name | ||
| HAVING COUNT(*) > 2 | ||
| ORDER BY dept_name, emp_count DESC | ||
| LIMIT 10 OFFSET 5 | ||
| """ | ||
| expected_sql = expected_sql.strip() | ||
| print(mosql.parse(expected_sql)) | ||
| print(ast) | ||
|
|
||
| sql = formatter.format(ast) | ||
| sql = sql.strip() | ||
|
|
||
| assert normalize_sql(sql) == normalize_sql(expected_sql) No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We will add them in the following PRs.
Overview:
This PR implements a basic formatter with a basic test for converting from our custom AST into JSON that can be understood by mosql, which turns it into plaintext. AST -> JSON -> SQL TEXT (via mosql).
Code Changes:
tests/test_query_parser.pyaddedtest_basic_format()to run a basic test of the formatter andnormalize_sql()for comparing if the expected sql is equal to our output.format()and it's clause specific formatters:format_select(),format_from()...etc incore/query_parser.py, responsible for converting AST to JSON clause by clause.