-
Notifications
You must be signed in to change notification settings - Fork 3
[Refactor] Basic Query Parser #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! let one small comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a query parser that converts SQL strings to a custom AST structure via the mosql library. It adds a basic parsing test and implements the core parse() method along with helper functions for parsing different SQL clauses.
Key Changes:
- Implements
QueryParser.parse()method that converts SQL to AST via mosql intermediate JSON - Adds helper methods for parsing SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses
- Implements expression parsing, alias resolution, and operator/join type normalization
- Adds comprehensive test case
test_basic_parse()that validates parsing of complex queries with JOINs, aggregations, and multiple clauses
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 20 comments.
| File | Description |
|---|---|
| tests/test_query_parser.py | Adds test_basic_parse() function with comprehensive test case covering JOINs, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, and OFFSET; updates imports to include new node types |
| core/query_parser.py | Implements complete query parser with parse() method and 10+ helper methods for clause parsing, expression parsing, alias resolution, and type normalization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
core/query_parser.py
Outdated
| def resolve_aliases(self, expr: Node) -> Node: | ||
| if isinstance(expr, OperatorNode): | ||
| # Recursively resolve aliases in operator operands | ||
| left = self.resolve_aliases(expr.children[0]) | ||
| right = self.resolve_aliases(expr.children[1]) | ||
| return OperatorNode(left, expr.name, right) | ||
| elif isinstance(expr, FunctionNode): | ||
| # Check if this function matches an aliased function from SELECT | ||
| if expr.alias is None: | ||
| for alias, aliased_expr in self.aliases.items(): | ||
| if isinstance(aliased_expr, FunctionNode): | ||
| if (expr.name == aliased_expr.name and | ||
| len(expr.children) == len(aliased_expr.children) and | ||
| all(expr.children[i] == aliased_expr.children[i] | ||
| for i in range(len(expr.children)))): | ||
| # This function matches an aliased one, use the alias | ||
| expr.alias = alias | ||
| break | ||
| return expr | ||
| elif isinstance(expr, ColumnNode): | ||
| # Check if this column matches an aliased column from SELECT | ||
| if expr.alias is None: | ||
| for alias, aliased_expr in self.aliases.items(): | ||
| if isinstance(aliased_expr, ColumnNode): | ||
| if (expr.name == aliased_expr.name and | ||
| expr.parent_alias == aliased_expr.parent_alias): | ||
| # This column matches an aliased one, use the alias | ||
| expr.alias = alias | ||
| break | ||
| return expr | ||
| else: | ||
| return expr |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The alias resolution in the HAVING clause iterates through all aliases to find matching FunctionNode or ColumnNode instances (lines 233-241, 246-252). For large SELECT clauses with many aliases, this could be inefficient. Consider optimizing by creating a separate lookup structure or only checking relevant aliases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is OK for now.
core/query_parser.py
Outdated
|
|
||
| # [2] Our new code | ||
| # Any (JSON) -> AST (QueryNode) | ||
| self.aliases = {} |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The self.aliases instance variable is initialized within the parse() method rather than in __init__. This means the aliases dictionary persists between parse calls and could lead to stale alias references affecting subsequent parses. Consider either: 1) initializing self.aliases = {} in a proper __init__ method, or 2) passing aliases as a parameter through the helper methods instead of using instance state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point which I also noticed. If we use the instance member to store aliases, this instance is not safe for multi-threading. [1] One way to solve this issue is through a helper parameter being passed through call the following function calls through the function arguments, similar to what we use as the memo in the rewriter engine. [2] Or we can make sure the QueryParser is declared as non-thread-safe, and every place we need to use the parse function, we create a new instance of the parser, i.e., parser = QueryParser(), and if we use this approach, it makes more sense to init the self.aliases = {} in the __init__ function instead of this parse function. @HazelYuAhiru , please make a decision with coordination with @colinthebomb1 's PR, if he uses the same approach (some internal state shared by different functions as an instance-level state), we can use [2]. If he already used [1], you may change to use [1]. If you decide to use [2], please find a proper annotation tag in Python to say it is non-thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'v updated the parser to use a thread-safe approach
| from core.ast.node import ( | ||
| Node, QueryNode, SelectNode, FromNode, WhereNode, TableNode, ColumnNode, | ||
| LiteralNode, OperatorNode, FunctionNode, GroupByNode, HavingNode, | ||
| OrderByNode, OrderByItemNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'SubqueryNode' is not used.
Import of 'VarNode' is not used.
Import of 'VarSetNode' is not used.
| OrderByNode, OrderByItemNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode | |
| OrderByNode, OrderByItemNode, LimitNode, OffsetNode, JoinNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed this version of implementation does not consider subquery. We will consider it in the next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, in this case I will leave a TODO comment above as reminders.
baiqiushi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @HazelYuAhiru for the PR! It looks fantastic overall. Ony a few small changes are needed. Please take care of them. Thanks!
| from core.ast.node import ( | ||
| Node, QueryNode, SelectNode, FromNode, WhereNode, TableNode, ColumnNode, | ||
| LiteralNode, OperatorNode, FunctionNode, GroupByNode, HavingNode, | ||
| OrderByNode, OrderByItemNode, LimitNode, OffsetNode, SubqueryNode, VarNode, VarSetNode, JoinNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed this version of implementation does not consider subquery. We will consider it in the next iteration.
core/query_parser.py
Outdated
|
|
||
| # [2] Our new code | ||
| # Any (JSON) -> AST (QueryNode) | ||
| self.aliases = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point which I also noticed. If we use the instance member to store aliases, this instance is not safe for multi-threading. [1] One way to solve this issue is through a helper parameter being passed through call the following function calls through the function arguments, similar to what we use as the memo in the rewriter engine. [2] Or we can make sure the QueryParser is declared as non-thread-safe, and every place we need to use the parse function, we create a new instance of the parser, i.e., parser = QueryParser(), and if we use this approach, it makes more sense to init the self.aliases = {} in the __init__ function instead of this parse function. @HazelYuAhiru , please make a decision with coordination with @colinthebomb1 's PR, if he uses the same approach (some internal state shared by different functions as an instance-level state), we can use [2]. If he already used [1], you may change to use [1]. If you decide to use [2], please find a proper annotation tag in Python to say it is non-thread-safe.
first batch fix Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Overview:
This PR implements the query parser and adds tests for converting an input SQL query string into our custom AST structure (SQL → JSON via mosql → custom AST). The base test case passes, and all additional test cases run without errors.
Code Changes:
tests/test_query_parser.py.parse()and its helper functions.