[WIP] Add preliminary Clickhouse formatter #922

mattbasta · 2025-11-14T21:01:27Z

Addresses #614

I've started putting together a Clickhouse formatter. Before I start getting into the weeds with tests and updating all of the docs/tests/playground, I wanted to check that things are directionally correct. Please let me know if there's anything you'd like to see done differently before I proceed.

nene

Looks mostly in the right direction, but I spotted some issues.

Most glaringly the fact that formatting of SELECT clauses has been completely neglected.

src/languages/clickhouse/clickhouse.formatter.ts

src/languages/clickhouse/clickhouse.keywords.ts

src/languages/clickhouse/clickhouse.formatter.ts

nene · 2025-11-15T09:34:37Z

I really would encourage you to add tests as early on as possible. You can start with just:

import { format as originalFormat, FormatFn } from '../src/sqlFormatter.js';
import behavesLikeSqlFormatter from './behavesLikeSqlFormatter.js';

describe('ClickhouseFormatter', () => {
  const language = 'clickhouse';
  const format: FormatFn = (query, cfg = {}) => originalFormat(query, { ...cfg, language });

  behavesLikeSqlFormatter(format);
  // or maybe (I'm unsure how similar exactly Clickhouse is to PostgreSQL)
  // behavesLikePostgresqlFormatter(format);
});

That will make sure your configuration will work for the basic stuff that should be the same in all SQL dialects.

If any of these bahavesLikeSqlFormatter() tests fail, then I would ask you to fix the problem anyway. It's fairly unlikely that there's something in Clickhouse dialect that would necessitate a change to these core tests.

nene · 2025-11-15T09:40:03Z

Also, it would be of great help if you went through the wiki and added information there about the Clickhouse dialect.

Especially these first few pages about Identifiers, Parameters, ... Comments.

Doing that will also help you to get these details right about Clickhouse dialect.

mattbasta · 2025-11-15T20:57:20Z

@nene I appreciate you taking the time to look! I'm very aware that it's not nearly ready to land and that it needs quite a bit of love, I just didn't want to invest a ton more time if it's completely off the mark. I have a commit in progress with the tests; I'm trying to aim for nearly all of the example queries from the docs to get handled correctly. I'll ping you when I have something that's ready for review.

mattbasta · 2025-11-19T15:52:14Z

Hey @nene, I've been making a lot of progress getting things in order. The code is in a much better place, but I wanted to pause before continuing to ask for your opinion, because I don't know that there's an obvious right answer to what I'm running into. There's two components to this, which are intertwined.

The first part is that Clickhouse lets you drop/alter multiple resources at once. For instance:

DROP TABLE foo, bar;

would be valid according to https://clickhouse.com/docs/sql-reference/statements/drop#drop-table

However, with DROP TABLE appearing as a tabular one-line clause, this would be formatted as

DROP TABLE foo,
bar;

which doesn't seem correct. It would seem to me that the more appropriate choice would be to make DROP TABLE a reserved clause, which would format it like this instead:

DROP TABLE
  foo,
  bar

This makes it consistent with similar syntax, like with ORDER BY. In the simplest case where these are just table identifiers, this doesn't feel great, especially when a simple DROP TABLE statement ends up looking like

DROP TABLE
  foo

To my knowledge, there's no way to make a clause conditionally tabular, unless I'm missing something. On one hand, this could be made consistent with the other dialects and the feature tests would pass, but the multi-resource case would be pretty unfortunate. On the other hand, I could break convention with the other formatters and use a reserved clause here, and have a slightly less pleasant single-resource case that looks solid with multiple resources.

As a note, SELECT without a WHERE/FROM/etc. looks like the second case:

-- SELECT foo
SELECT
  foo

So this isn't completely unprecedented cosmetically.

My personal preference is the second option, but I can also appreciate that you'd want the different formatters to behave consistently.

The second piece is essentially the same issue. Consider this statement:

ALTER ROW POLICY IF EXISTS policy1 ON CLUSTER cluster_name1 ON database1.table1 RENAME TO new_name1, policy2 ON CLUSTER cluster_name2 ON database2.table2 RENAME TO new_name2;

RENAME TO behaves sort of like an infix operator here, and I think that this statement looks great when it's considered a keyword phrase, with ALTER ROW POLICY treated as a reserved clause:

ALTER ROW POLICY IF EXISTS
  policy1 ON CLUSTER cluster_name1 ON database1.table1 RENAME TO new_name1,
  policy2 ON CLUSTER cluster_name2 ON database2.table2 RENAME TO new_name2;

However, this disagrees with the existing built-in feature tests, which would format

ALTER TABLE supplier RENAME TO the_one_who_supplies

as

ALTER TABLE supplier
RENAME TO the_one_who_supplies

where RENAME TO is considered a tabular one-line clause. With my choice to make this a keyword phrase, this feature test would be formatted as a single line (essentially unchanged from the input).

Similar to the first item I noted above, I could break convention with the existing tests and follow the rules that I believe make Clickhouse format in a way that I believe looks the best and with internal consistency (making multi-resource statements into reserved clauses and having RENAME TO as a keyword phrase). I could also choose to make it consistent with other formatters at the expense of having certain statements produce confusing/ugly results (using tabular one-line clauses for all statements and RENAME TO).

One other option (which addresses the tabular one-line clause vs reserved clause discrepancy, but not the RENAME TO discrepancy) is for me to implement the ability to have a conditionally tabular one-line clause. For a conditionally tabular clause, it would format as a tabular one-line clause if an EOF/semicolon or another clause was encountered first, or as a reserved clause if another clause was encountered first. E.g., making DROP TABLE a conditionally tabular clause would format

-- DROP TABLE foo
DROP TABLE foo -- EOF triggers tabular behavior

-- DROP TABLE foo, bar
DROP TABLE
  foo, -- comma triggers reserved clause behavior
  bar

I haven't done a lot of research to understand how big of an undertaking this would be, and as far as I can tell, this isn't something that exists already (but if it does and I've missed it, please let me know!)

I'd love your thoughts on this.

nene · 2025-11-19T17:59:23Z

I think with both of these it's best to consider which is the common way one would use a statement. Like the DROP TABLE table1, table2, ... syntax is supported by several SQL dialects. But in practice one rarely drops multiple tables at once. And even when one does, you need to be intimately familiar with the syntax supported by your SQL dialect. Much simpler to just write multiple DROP TABLE statements in row to achieve the same.

Similarly with these policies. I would guess it's quite rare that one needs to alter multiple policies at once. So I wouldn't optimize the formatter for these rare cases, but rather for the most common case.

In general...

The thing is, that this formatter works using heuristics and it doesn't really understand SQL. It just looks for some patterns and then tries to make some best guesses. There's no shortage of cases where it does a poor job. But there's only so much it can do with this sort of limited architecture.

To address these fundamental issues I built a new tool: prettier-plugin-sql-cst, which actually parses the SQL and is able to handle this sort of cases. Heh... actually now that I tested it, I discovered it messes up the DROP TABLE foo, bar formatting, but similar things like DROP VIEW foo, bar do work as expected. Which again highlights that it's a syntax one tends to not use.

nene

A few quick comments.

nene · 2025-11-19T18:18:03Z

src/languages/clickhouse/clickhouse.keywords.ts

+export const keywords: string[] = [
+  // Derived from https://github.com/ClickHouse/ClickHouse/blob/827a7ef9f6d727ef511fea7785a1243541509efb/tests/fuzz/dictionaries/keywords.dict#L4
+  // Clickhouse keywords can span multiple individual words (e.g., "ADD COLUMN"). See
+  // `keywordPhrases` below for all of these.


This comment seems outdated now.

Yep, still have some cleanup left to do. Planning to take a pass towards the end.

nene · 2025-11-19T18:23:44Z

test/features/strings.ts

  // Note: ''-qq and ''-bs can be combined to allow for both types of escaping
  | "''-qq" // with repeated-quote escaping
  | "''-bs" // with backslash escaping
+  | "''-qq-bs" // with repeated-quote and backslash escaping


No need to add this, in the test one can just write:

supportsStrings(format, ["''-qq", "''-bs"]);

nene · 2025-11-19T18:28:37Z

test/features/identifiers.ts

 type IdentType =
  | '""-qq' // with repeated-quote escaping
+  | '""-bs' // with backslash escaping
+  | '""-qq-bs' // with repeated-quote and backslash escaping


Similarly to string tests, we don't need to include the combination of two string types. That'll just complicate the tests. Lets only add the ""-bs type and just list both when calling supportsIdentifiers().

nene · 2025-11-20T08:32:55Z

src/languages/clickhouse/clickhouse.formatter.ts

+      token.text === 'SET' &&
+      nextToken.type === TokenType.OPEN_PAREN
+    ) {
+      return { ...token, type: TokenType.RESERVED_FUNCTION_NAME, text: token.raw };


Should not change text and raw properties of a token.

changing text to anything else than uppercase will mess up any other code that's looking for a token of that name.

changing raw to anything else than the original text will mess up formatting

See src/languages/mariadb/likeMariaDb.ts for comparison.

nene · 2025-11-20T08:48:30Z

src/languages/clickhouse/clickhouse.formatter.ts

+    // We should format `set(100)` as-is rather than `SET (100)`
+    if (
+      token.type === TokenType.RESERVED_CLAUSE &&
+      token.text === 'SET' &&


Better to use one of the utility function like isToken.SET(token).

Feel free to add more things to these utility functions as long as they're not clickhouse dialect specific

nene · 2025-11-20T09:01:13Z

src/languages/clickhouse/clickhouse.formatter.ts

+ * IN operator: foo IN (1, 2, 3) - IN comes after an identifier/expression
+ * IN function: IN(foo, 1, 2, 3) - IN comes at start or after operators/keywords


I see in listed in section titled Functions for Implementing the IN Operator

This gives me the impression that in being available as a function is more of an implementation detail of Clickhouse and one normally wouldn't use it like so.

nene · 2025-11-20T09:21:22Z

src/languages/clickhouse/clickhouse.formatter.ts

+    if (
+      token.type === TokenType.RESERVED_FUNCTION_NAME &&
+      (token.text === 'IN' || token.text === 'ANY')


I see IN and ANY being listed in both clickhouse.keywords.ts and clickhouse.functions.ts. There seems to be many more. Like there's AND, OR, NOT, CAST, DATE in both functions and keywords.

Having them listed in both places creates confusion, as it's not obvious which one takes priority. It happens to be that function names are detected before keywords by the tokenizer, so these all end up being classified as function names. But it would be better to avoid this ambiguity by only listing them in one place.

I would suggest listing them in the category that's the more common use case. Like I would move all these things to the keywords list and remove them completely from functions. This way it also aligns better with other dialects.

PS. AND and OR won't really have any effect in either list as these are hard-coded to the tokenizer and will always get detected as special type of tokens.

nene · 2025-11-20T09:31:33Z

src/languages/clickhouse/clickhouse.formatter.ts

+ * IN function: IN(foo, 1, 2, 3) - IN comes at start or after operators/keywords
+ *
+ * ANY operator: foo = ANY (1, 2, 3) - ANY comes after an operator like =
+ * ANY function: ANY(foo, 1, 2, 3) - ANY comes at start or after operators/keywords


I don't think any() function is used like so. The docs say it's used as an aggregate function:

SELECT any(column) FROM tbl

codesandbox-ci · 2025-12-05T01:00:45Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Add preliminary Clickhouse formatter

80f6e1e

mattbasta force-pushed the clickhouse branch from a659fd2 to 80f6e1e Compare November 14, 2025 21:05

nene reviewed Nov 15, 2025

View reviewed changes

mattbasta added 3 commits November 17, 2025 17:38

Get formatter working with existing features

d457a98

Checkpoint tests

510997c

Breaking tests for consistency within Clickhouse formatter

be7a20d

nene reviewed Nov 19, 2025

View reviewed changes

nene reviewed Nov 20, 2025

View reviewed changes

mattbasta added 2 commits December 4, 2025 19:56

Finish getting the tests together

7c4f798

Cleanup

1c19d1a

		* IN operator: foo IN (1, 2, 3) - IN comes after an identifier/expression
		* IN function: IN(foo, 1, 2, 3) - IN comes at start or after operators/keywords

[WIP] Add preliminary Clickhouse formatter #922

Are you sure you want to change the base?

[WIP] Add preliminary Clickhouse formatter #922

Uh oh!

Conversation

mattbasta commented Nov 14, 2025

Uh oh!

nene left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nene commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nene commented Nov 15, 2025

Uh oh!

mattbasta commented Nov 15, 2025

Uh oh!

mattbasta commented Nov 19, 2025

Uh oh!

nene commented Nov 19, 2025

Uh oh!

nene left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codesandbox-ci bot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nene commented Nov 15, 2025 •

edited

Loading