Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 48 additions & 1 deletion fyi/semgrep-grammars/src/semgrep-cpp/grammar.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,10 @@ module.exports = grammar(base_grammar, {

// Typed metavariables

semgrep_metavar: $ => /\$[A-Z_][A-Z_0-9]*/,
// Token precedence so the lexer prefers `semgrep_metavar` over the
// `identifier` rule (whose regex also accepts a leading `$`) when
// the surrounding rule allows a metavar (e.g. inside an enum body).
semgrep_metavar: $ => token(prec(1, /\$[A-Z_][A-Z_0-9]*/)),

semgrep_typed_metavar: $ =>
seq(
Expand Down Expand Up @@ -99,6 +102,50 @@ module.exports = grammar(base_grammar, {
$.semgrep_ellipsis
),

// Allow `...` and `$X` inside class/struct bodies, e.g.
// class C { ... }
// struct S { $X }
// The lexer prefers `semgrep_metavar` over `identifier` thanks to
// the token precedence on `semgrep_metavar`, so a bare `$X` ends
// up here rather than as a malformed field_declaration.
_field_declaration_list_item: ($, previous) => choice(
previous,
$.semgrep_ellipsis,
$.semgrep_metavar
),

// Allow `...` and `$X` inside enum bodies, e.g.
// enum E { ... }
// The upstream rule is a `seq` so we must override it wholesale
// (copied from tree-sitter-c with semgrep alternatives added).
enumerator_list: $ => seq(
'{',
repeat(choice(
seq($.enumerator, ','),
alias($.preproc_if_in_enumerator_list, $.preproc_if),
alias($.preproc_ifdef_in_enumerator_list, $.preproc_ifdef),
seq($.preproc_call, ','),
seq($.semgrep_ellipsis, ','),
seq($.semgrep_metavar, ','),
)),
optional(choice(
$.enumerator,
alias($.preproc_if_in_enumerator_list_no_comma, $.preproc_if),
alias($.preproc_ifdef_in_enumerator_list_no_comma, $.preproc_ifdef),
$.preproc_call,
$.semgrep_ellipsis,
$.semgrep_metavar,
)),
'}',
),

// Allow `operator $OP` as an operator name, e.g.
// T operator $OP(T x);
operator_name: ($, previous) => choice(
previous,
prec(1, seq('operator', $.semgrep_metavar)),
),

// So we prefer to parse a unary left fold for
// 1 + ...
// rather than the addition of an ellipsis
Expand Down
46 changes: 32 additions & 14 deletions fyi/versions
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@ File: semgrep-grammars/src/tree-sitter-c/LICENSE
Git repo name: tree-sitter-c
Latest commit in repo: ecdd500806cf8154d944344f1df6418b32e0e9a7
Last change in file:
commit ecdd500806cf8154d944344f1df6418b32e0e9a7
Author: Brandon Wu <49291449+brandonspark@users.noreply.github.com>
Date: Fri Jan 26 12:20:17 2024 -0800
commit af2904ca831cbb33cdc3bb7bacc2ed9840bd0a3c
Author: Max Brunsfeld <maxbrunsfeld@gmail.com>
Date: Fri Sep 5 23:53:51 2014 -0700

fix: properly suffix elifdef
Initial commit
---
File: semgrep-grammars/src/tree-sitter-c/grammar.js
Git repo name: tree-sitter-c
Expand All @@ -22,11 +22,11 @@ File: semgrep-grammars/src/tree-sitter-cpp/LICENSE
Git repo name: tree-sitter-cpp
Latest commit in repo: 4ca37be8e70e5a40ae95688bec56b886ba945888
Last change in file:
commit 4ca37be8e70e5a40ae95688bec56b886ba945888
Author: Brandon Wu <49291449+brandonspark@users.noreply.github.com>
Date: Fri Jan 26 08:33:26 2024 -0800
commit d8966822015417d060ffaafd95a71a49dd28ec99
Author: Max Brunsfeld <maxbrunsfeld@gmail.com>
Date: Fri Jan 15 11:04:19 2016 -0800

disambiguate fold vs parenthesized assignment (#239)
Initial commit
---
File: semgrep-grammars/src/tree-sitter-cpp/grammar.js
Git repo name: tree-sitter-cpp
Expand All @@ -39,12 +39,30 @@ Last change in file:
disambiguate fold vs parenthesized assignment (#239)
---
File: semgrep-grammars/src/semgrep-cpp/grammar.js
Git repo name: ocaml-tree-sitter-semgrep
Latest commit in repo: 091f5438fc0c15b80217f00e5b94ec0e55517383
Git repo name: agent-aab45847a71009554
Latest commit in repo: 9244700fc8ed0c101f640a33c6f4c115ad73d3b5
Last change in file:
commit 5f3836d894376e97f7582fd32c2bd01b98697886
Author: brandonspark <wu.brandonj@gmail.com>
Date: Mon Jan 29 15:03:38 2024 -0800
commit 9244700fc8ed0c101f640a33c6f4c115ad73d3b5
Author: brandonspark <brandon@semgrep.com>
Date: Wed Apr 29 17:34:00 2026 -0700

remove redundant choice
fix(cpp): allow Semgrep patterns in class/struct/enum bodies and operator overloads

Augments the C++ grammar so canonical Semgrep patterns parse cleanly:

- LANG-486: `...` and `$X` inside class/struct bodies (via
`_field_declaration_list_item`) and enum bodies (via a wholesale
override of `enumerator_list`, since the upstream rule is a `seq`).
- LANG-497: `operator $OP` as an operator name (via a `previous`-style
augmentation of `operator_name`).

The `semgrep_metavar` token gets a precedence bump so the lexer prefers
it over `identifier` (whose regex also accepts a leading `$`) when both
are valid at a position; this is what lets a bare `$X` parse as a
metavariable inside enum/struct bodies instead of as a malformed
declaration or enumerator.

Adds 6 corpus tests covering the minimal ticket repros.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
Loading