Skip to content

SQL parser & DDL hand-rolled-parse correctness bugs (6 sub-items) #46

@hollanf

Description

@hollanf

Six independent bugs in hand-rolled parsing across DDL and preprocessor paths. All are reproducible via plain SQL, all corrupt statement semantics silently (no error, or error on valid input). One of them (#6) can panic the process on non-ASCII input.


1. CREATE TRIGGER body split mis-detects BEGIN inside string literals / WHEN clauses

File: nodedb/src/control/server/pgwire/ddl/trigger/parse.rs:321-338 (find_begin_pos)

let pos = upper[search_from..].find("BEGIN")?;
let abs_pos = search_from + pos;
let before_ok = abs_pos == 0 || !s.as_bytes()[abs_pos - 1].is_ascii_alphanumeric() && ...
// ↑ word-boundary check but does not skip over string literals

find_begin_pos scans for BEGIN on the full string including inside single-quoted literals in the WHEN clause. First match wins, so the header/body split lands at the wrong byte.

Repro:

CREATE TRIGGER tr BEFORE INSERT ON c FOR EACH ROW
  WHEN (NEW.label = 'BEGIN' OR NEW.status = 'open')
  BEGIN RETURN; END

The split places BEGIN from the string literal as the body start → malformed header parse or silently-mangled body.


2. INSERT with ) inside a string literal breaks the VALUES pre-parse

File: nodedb/src/control/server/pgwire/ddl/collection/insert_parse.rs:128-149

let first_close = match sql[first_open..values_kw].rfind(')') { ... };
// ...
let vals_close = match after_values.rfind(')') { ... };

Both rfind(')') calls scan raw bytes and ignore quoted strings. A value containing ) pushes vals_close past the true VALUES closing paren; downstream split_values sees unbalanced parens and coerces wrong types.

Repro:

INSERT INTO t (a) VALUES ('hello)world') RETURNING id

3. split_values loses quote tracking inside [...] / (...)

File: nodedb/src/control/server/pgwire/ddl/sql_parse.rs:6-29

for i in 0..bytes.len() {
    match bytes[i] {
        b'\'' if bracket_depth == 0 => in_quote = !in_quote,
        b'[' | b'(' if !in_quote => bracket_depth += 1,
        b']' | b')' if !in_quote => bracket_depth = (bracket_depth - 1).max(0),
        ...

b'\'' only flips in_quote when bracket_depth == 0. Inside an array or nested paren, quote tracking is disabled → a ) inside a string decrements the bracket counter, and subsequent real delimiters balance wrongly. A comma outside the array but still inside the intended string run is treated as a top-level separator (or vice versa).

Repro:

INSERT INTO t (a, b) VALUES (ARRAY['x)y', 'z'], 42)

4. CREATE RLS POLICY predicate corrupted by trim_matches('(' | ')')

File: nodedb/src/control/server/pgwire/ddl/rls/parse.rs:73-77

let predicate_str = parts[using_idx + 1..pred_end]
    .join(" ")
    .trim_matches(|c: char| c == '(' || c == ')')
    .to_string();

trim_matches removes all leading and trailing parens (any of ( or )), not only one matched pair. A balanced ((x > 0) AND (y = 1)) becomes x > 0) AND (y = 1 → broken parse, often falls through to a legacy space-split path that stores a nonsensical ScanFilter. Since this governs RLS policy storage, the stored predicate is a security-sensitive malfunction.

Repro:

CREATE RLS POLICY p ON t FOR READ USING ((x > 0) AND (y = 1))

5. Object-literal { ... } rewriter in SQL preprocessor breaks on ''-escaped quotes

File: nodedb-sql/src/parser/preprocess.rs:270-294 (find_matching_brace)

for i in start..chars.len() {
    match chars[i] {
        '\'' if !in_string => in_string = true,
        '\'' if in_string => {
            if i + 1 < chars.len() && chars[i + 1] == '\'' {
                continue;            // ← only advances by 1; re-processes the escape's second quote
            }
            in_string = false;
        }
        ...

The continue in a for i in start..chars.len() only advances by one, landing on the escape's second ', which the next iteration reads as a closing quote. in_string flips to false prematurely; the closing } is then encountered while in_string is still wrong, so the function returns None and the caller skips the rewrite — passing raw { ... } text to sqlparser which errors out.

Repro:

SELECT * FROM t WHERE text_match(x, 'q', { note: 'it''s' })

6. CRITICAL: Expression tokenizer in nodedb-query panics on multi-byte UTF-8

File: nodedb-query/src/expr_parse.rs:100-127, 139-160

if i + 1 < bytes.len() {
    let two = &input[i..i + 2];         // ← panics if i..i+2 crosses a char boundary
    if matches!(two, "<=" | ">=" | ...) { ... }
}
...
// string-literal body:
s.push(bytes[i] as char);               // ← each raw byte becomes Latin-1 char

The tokenizer iterates byte-by-byte but slices &input[i..i+2] as a &str. If i points into the middle of a multi-byte UTF-8 codepoint, the slice is not a char boundary and Rust panics with byte index N is not a char boundary. This tokenizer is reachable from CREATE COLLECTION … GENERATED ALWAYS AS (<expr>) and the typeguard / check-constraint compile path. The string-literal loop additionally corrupts multi-byte content by casting each byte as a single char.

Repro:

CREATE COLLECTION c FIELDS (x STRING, y STRING GENERATED ALWAYS AS ('' || x))

Any 3-byte codepoint followed by <=/>=/!=/<>/|| nearby in the expression triggers the panic.


7. CREATE MATERIALIZED VIEW chops SELECT body on any WITH (including CTEs)

File: nodedb/src/control/server/pgwire/ddl/materialized_view/parse.rs:48-108

let with_pos = remaining.find(" WITH").or_else(|| { ... });
let query_end = with_pos.map(|p| query_start + p).unwrap_or(sql.len());
let query_sql = sql[query_start..query_end].trim().to_string();
...
let with_pos = match upper.rfind("WITH") { ... };   // in extract_refresh_mode

Both find(" WITH") and rfind("WITH") match any occurrence of the substring — including inside the SELECT body (a CTE WITH cte AS (...), a column alias with_x, a string literal 'WITH'). The parser truncates query_sql at the CTE start and parses the CTE-prefix as the refresh-options clause, so a valid CTE-based SELECT becomes an error or a corrupted materialized view.

Repros:

CREATE MATERIALIZED VIEW v ON t AS
  SELECT * FROM t WITH (recursive_opt = 1);
CREATE MATERIALIZED VIEW v ON t AS
  WITH s AS (SELECT * FROM t) SELECT * FROM s
  WITH (refresh = 'manual');

Checklist

  • 1. Skip string literals in find_begin_pos (trigger parse)
  • 2. Skip string literals in rfind(')') calls in insert_parse
  • 3. Track quote state independently of bracket depth in split_values
  • 4. Replace trim_matches('(' | ')') with a single balanced-paren strip (or don't strip at all)
  • 5. Advance cursor by 2 on '' escape in find_matching_brace
  • 6. Switch expr_parse tokenizer to &[char] or use str::char_indices / pattern = "<=" matching on &str
  • 7. Parse MATERIALIZED VIEW via sqlparser rather than substring-find

All items are independently reproducible via the SQL above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions