fix: reindent idempotence broken by accumulating blank lines between statements#857
Open
gaoflow wants to merge 1 commit into
Open
fix: reindent idempotence broken by accumulating blank lines between statements#857gaoflow wants to merge 1 commit into
gaoflow wants to merge 1 commit into
Conversation
…statements _splitting_statements treated inter-statement blank lines (added by process()) as meaningful content on the next format pass, inserting an extra newline each time. After the leading whitespace is deleted, check whether any non-whitespace token precedes the DML/DDL keyword before inserting a separator line.
There was a problem hiding this comment.
Pull request overview
Fixes a formatting idempotence bug in ReindentFilter where repeated sqlparse.format(..., reindent=True) calls could accumulate extra blank lines between semicolon-separated statements by avoiding an extra separator insertion when the “previous token” is only whitespace.
Changes:
- Adjust
_split_statementsto insert a newline only when there is actual non-whitespace content before a DML/DDL keyword. - Add regression tests asserting formatting idempotence for multi-statement input and for
UNIONwithin a single statement.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| sqlparse/filters/reindent.py | Prevents inserting an extra separator newline when a statement begins with only whitespace from prior formatting. |
| tests/test_format.py | Adds regression tests ensuring reindent=True formatting is idempotent across multiple passes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
94
to
+103
| # only break if it's not the first token | ||
| if prev_: | ||
| tlist.insert_before(tidx, self.nl()) | ||
| tidx += 1 | ||
| # Check that there is actual non-whitespace content before | ||
| # this keyword. If all preceding tokens are whitespace | ||
| # (e.g. inter-statement blank lines that process() already | ||
| # handles), skip -- otherwise we double-count the separator. | ||
| _, before = tlist.token_prev(tidx, skip_ws=True) | ||
| if before is not None: | ||
| tlist.insert_before(tidx, self.nl()) | ||
| tidx += 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
sqlparse.format(sql, reindent=True)is not idempotent when formatting multiple statements separated by semicolons. Each re-format adds an extra blank line between statements, accumulating unboundedly until stabilizing at 2 blank lines after the second pass.Minimal reproducer:
Root cause
In
_split_statements, after deleting the leading whitespace before a DML/DDL keyword, the conditionif prev_:still evaluated true becauseprev_held the deleted whitespace token object. This caused a spurious newline insertion even when the keyword was at the very start of the statement (whereprocess()already handles inter-statement separation). On repeated formatting, the blank lines added byprocess()were then treated as meaningful content and another newline was layered on top.Fix
After deleting leading whitespace, check whether any non-whitespace token exists before the DML/DDL keyword (via
token_prev(tidx, skip_ws=True)). Only insert the separator newline when there is actual content preceding the keyword -- not when all preceding tokens are whitespace from a prior format pass.Changes
sqlparse/filters/reindent.py: +8/-2 in_split_statementstests/test_format.py: +31 lines (2 new regression tests)All 489 existing + new tests pass.