Fix!: don't consume connected tokens when parsing quoted identifiers#5357
Merged
georgesittas merged 2 commits intomainfrom Sep 12, 2025
Merged
Fix!: don't consume connected tokens when parsing quoted identifiers#5357georgesittas merged 2 commits intomainfrom
georgesittas merged 2 commits intomainfrom
Conversation
77ad2e0 to
84238cf
Compare
georgesittas
commented
Sep 11, 2025
| >>> sql = "SELECT date_day, @PIVOT(status, ['cancelled', 'completed']) FROM rides GROUP BY 1" | ||
| >>> MacroEvaluator().transform(parse_one(sql)).sql() | ||
| 'SELECT date_day, SUM(CASE WHEN status = \\'cancelled\\' THEN 1 ELSE 0 END) AS "\\'cancelled\\'", SUM(CASE WHEN status = \\'completed\\' THEN 1 ELSE 0 END) AS "\\'completed\\'" FROM rides GROUP BY 1' | ||
| 'SELECT date_day, SUM(CASE WHEN status = \\'cancelled\\' THEN 1 ELSE 0 END) AS "cancelled", SUM(CASE WHEN status = \\'completed\\' THEN 1 ELSE 0 END) AS "completed" FROM rides GROUP BY 1' |
Contributor
Author
There was a problem hiding this comment.
Not sure what led to this decision, i.e. including the SQL-representation of the value as-is in the alias.
georgesittas
commented
Sep 11, 2025
| "test_valid_to", | ||
| TRUE AS "_exists" | ||
| FROM ""__temp_target_efgh"" | ||
| FROM "__temp_target_efgh" |
Contributor
Author
There was a problem hiding this comment.
This test was incorrect, the quotes here are off.
georgesittas
commented
Sep 11, 2025
| tmp_path, | ||
| pathlib.Path(audits_dir, "audit_1.sql"), | ||
| "AUDIT(name assert_positive_id, dialect 'duckdb'); SELECT * FROM @this_model WHERE \"CaseSensitive\"_item_id < 0;", | ||
| "AUDIT(name assert_positive_id, dialect 'duckdb'); SELECT * FROM @this_model WHERE \"CaseSensitive_item_id\" < 0;", |
Contributor
Author
There was a problem hiding this comment.
This test was also incorrect, it's like trying to do SELECT * FROM t WHERE "foo"_bar < 0.
erindru
approved these changes
Sep 11, 2025
Collaborator
erindru
left a comment
There was a problem hiding this comment.
Wow, this PR highlights quite some jankiness!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses two bugs:
_parse_id_var, once we parse an identifier we check whether the next token in the stream is "connected" to the identifier, i.e., there's no whitespace between the two tokens. This is done in order to support our custom macro syntax (example). Unfortunately, the condition was too lax, resulting in consuming both tokens in cases like"foo"bar, when we needed to consume only one._schema_tmpcan be appended to a quoted identifier like"a"."b"."c", resulting in"a"."b"."c"_schema_tmp. Due to the parser being lax, we were somehow able to parse this.. we've succeeded by failing in this case :)