-
-
Notifications
You must be signed in to change notification settings - Fork 156
Description
Did you check existing issues?
- I have read all the tree-sitter docs if it relates to using the parser
- I have searched the existing issues of tree-sitter-cpp
Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)
No response
Describe the bug
In C++ there are a few alternatives of the postfix-expression rule which look like function calls, but are actually built-in expressions. These are casts like dynamic_cast as well as the typeid operator.
Currently, with tree-sitter-cpp version 0.23.4, these all parse as call_expressions, with the cast operations in particular parsed as template function invocations. The parenthesized expressions should be parsed by C++'s expression rule (with typeid additionally allowing the type-id rule), but because of this bug are misparsed as an optional expression-list. This most significantly affects typeid used with type-id, since it's likely those won't coincidentally look like function arguments (see below for an example of this).
This is a serious issue because it will cause users to misinterpret these expressions as ordinary function calls, which can affect highlighting and indentation. (In current emacs for example, typeid gets mis-highlighted as a function call, because tree-sitter says it's one.) While at first glance parsing them as function calls may seem like ultimately no big deal (any editor would likely need hardcoded checks for these special expressions regardless), the misparse does still cause problems for anything that needs to understand these expressions:
typeidwill very quickly fail to parse if it contains atype-idthat doesn't coincidentally look like a plausible function argument. Thus things liketypeid(const int)ortypeid(double)will fail.- A subtle issue lies with comma expressions like in
reinterpret_cast<double>(a, b). Thata, bshould be parsed as a single expression, but since the whole thing gets misinterpreted as a function call, it's instead parsed by tree-sitter as an expression-list, which ultimately means it gets parsed as two assignment-expressions separated by a comma. - Finally, this bug can sometimes accept expressions that it shouldn't. The expression-list rule for function calls allows braced initializers and parameter packs, which means things like
typeid({42})ortypeid(*c++...)are allowed when they shouldn't be.
Steps To Reproduce/Bad Parse Tree
typeid(unsigned long long) parses as:
(call_expression function: (identifier)
arguments:
(argument_list ( (identifier)
(ERROR (identifier) long)
)))
dynamic_cast<derived *>(foo, bar) parses as:
(call_expression
function:
(template_function name: (identifier)
arguments:
(template_argument_list <
(type_descriptor type: (type_identifier)
declarator: (abstract_pointer_declarator *))
>))
arguments: (argument_list ( (identifier) , (identifier) )))
static_cast<int>(y...) parses as:
(call_expression
function:
(template_function name: (identifier)
arguments:
(template_argument_list <
(type_descriptor type: (primitive_type))
>))
arguments:
(argument_list (
(parameter_pack_expansion pattern: (identifier) ...)
)))
(Trees are as represented by emacs's treesit explorer.)
Expected Behavior/Parse Tree
For these typeid operations and C++-style casts to be parsed as something other than call_expressions, so that they can't get misinterpreted as ordinary function calls. In addition to the issue with parsing the above examples as call_expressions, each example highlights an additional issue:
typeid(unsigned long long)should be valid, instead of causing an error.dynamic_cast<derived *>(foo, bar)should parsefoo, baras a single expression, not as a list of two arguments.static_cast<int>(y...)should be rejected, sincey...is not a valid expression.