Skip to content

feat(rust): Add language aware parsing for Rust#2672

Merged
stanislaw merged 3 commits intostrictdoc-project:mainfrom
haxtibal:tdmg/rust
Feb 8, 2026
Merged

feat(rust): Add language aware parsing for Rust#2672
stanislaw merged 3 commits intostrictdoc-project:mainfrom
haxtibal:tdmg/rust

Conversation

@haxtibal
Copy link
Contributor

@haxtibal haxtibal commented Feb 6, 2026

WHAT

This adds Rust-support to StrictDoc's language aware parsing feature. It notably supports Rust doc comment semantics to automatically bind comments to language constructs. It supports line/range markers from normal comments, custom tags from doc comments and forward relations by qualified path for an opinionated set of identifiable items.

A demo of all features is available at haxtibal.github.io/strictdoc-rust-demo.

Additional related changes are described in commit messages.

WHY

Rust is a frequently used language in domains where also StrictDoc is used (system programming, safety critical, security critical). See related feature request #2213.

HOW

We have two alternative proposals for doing Rust parsing:

  1. Use tree-sitter-rust Python module and reimplement some of the high-level Rust doc comment semantics on our own.
  2. Use syn crate (has doc comments semantics built-in), use json to transfer results to Python world.

This implements 1) and can be used to study pros and cons. It uses tree-sitter queries to match the parse tree for valid doc comment and normal comment patterns, where captures record relevant information, then python side post processing transforms captures to StrictDoc structures.

@stanislaw
Copy link
Collaborator

@haxtibal just rebase your branch on top of latest main. The linting issues were fixed there yesterday.

@haxtibal
Copy link
Contributor Author

haxtibal commented Feb 6, 2026

@haxtibal just rebase your branch on top of latest main. The linting issues were fixed there yesterday.

There's another error Module "tree_sitter" has no attribute "QueryCursor"; maybe "QueryError"? [attr-defined].

This is because QueryCursor is only available in >= tree-sitter 0.25, and tree-sitter 0.25 is only available for python >= 3.10, whereas StrictDoc still supports 3.9.

Three options:

  • add conditional code in rust_reader.py, to use old or new query API dependent on what version is available
  • make Rust an optional feature that is only available for StrictDoc installations on python >= 3.10
  • bump minimum python requirement of StrictDoc to 3.10

What do you think?

EDIT: making API usage conditional seems easy, will do that

@stanislaw
Copy link
Collaborator

I was going to drop 3.9 very soon. We can expedite and do it now but with a separate PR please.

haxtibal added a commit that referenced this pull request Feb 6, 2026
WHAT: Drop python 3.9, require >= 3.10.

WHY: Planned transition, done early to avoid conditional tree-sitter API
usage in #2672.
haxtibal added a commit that referenced this pull request Feb 6, 2026
WHAT: Drop python 3.9, require >= 3.10.

WHY: Planned transition, done early to avoid conditional tree-sitter API
usage in #2672.
@haxtibal haxtibal force-pushed the tdmg/rust branch 4 times, most recently from 0717b12 to f99ee87 Compare February 6, 2026 17:46
haxtibal added a commit to haxtibal/strictdoc that referenced this pull request Feb 8, 2026
WHAT:
This allows language readers to provide an inferred default scope to the
marker parser.

WHY:
In some situations, the explicit scope in '@relation(REQ-1,
scope=function)' provided by the user would be redundant. A good example
are Rust doc comments, which unambiguously determine the targeted
language construct. The construct in turn determines the scope. If a
language reader has the ability to infer the scope, it should do so and
not urge the user to provide it explicitly. This was triggered by strictdoc-project#2672.

HOW:
We slightly change lark grammar for @relation to:
- at least one requirement uid,
- then maybe more separated requirement uids,
- then maybe separated scope,
- then maybe separated role,
and use a passed default_scope from language readers to set the scope
if it was not explicitly given.
haxtibal added a commit to haxtibal/strictdoc that referenced this pull request Feb 8, 2026
WHAT:
This allows language readers to provide an inferred default scope to the
marker parser.

WHY:
In some situations, the explicit scope in '@relation(REQ-1,
scope=function)' provided by the user would be redundant. A good example
are Rust doc comments, which unambiguously determine the targeted
language construct. The construct in turn determines the scope. If a
language reader has the ability to infer the scope, it should do so and
not urge the user to provide it explicitly. This was triggered by strictdoc-project#2672.

HOW:
We slightly change lark grammar for @relation to:
- at least one requirement uid,
- then maybe more separated requirement uids,
- then maybe separated scope,
- then maybe separated role.

The default scope is used if the marker contained no user provided
scope. We exit with error if neither default nor user provided scope
were given.
haxtibal added a commit to haxtibal/strictdoc that referenced this pull request Feb 8, 2026
WHAT:
This allows language readers to provide an inferred default scope to the
marker parser.

WHY:
In some situations, the explicit scope in '@relation(REQ-1,
scope=function)' provided by the user would be redundant. A good example
are Rust doc comments, which unambiguously determine the targeted
language construct. The construct in turn determines the scope. If a
language reader has the ability to infer the scope, it should do so and
not urge the user to provide it explicitly. This was triggered by strictdoc-project#2672.

HOW:
We slightly change lark grammar for @relation to:
- at least one requirement uid,
- then maybe more separated requirement uids,
- then maybe separated scope,
- then maybe separated role.

The default scope is used if the marker contained no user provided
scope. We exit with error if neither default nor user provided scope
were given.
haxtibal added a commit to haxtibal/strictdoc that referenced this pull request Feb 8, 2026
WHAT:
This allows language readers to provide an inferred default scope to the
marker parser.

WHY:
In some situations, the explicit scope in '@relation(REQ-1,
scope=function)' provided by the user would be redundant. A good example
are Rust doc comments, which unambiguously determine the targeted
language construct. The construct in turn determines the scope. If a
language reader has the ability to infer the scope, it should do so and
not urge the user to provide it explicitly. This was triggered by strictdoc-project#2672.

HOW:
We slightly change lark grammar for @relation to:
- at least one requirement uid,
- then maybe more separated requirement uids,
- then maybe separated scope,
- then maybe separated role.

The default scope is used if the marker contained no user provided
scope. We exit with error if neither default nor user provided scope
were given.
haxtibal added a commit to haxtibal/strictdoc that referenced this pull request Feb 8, 2026
WHAT: Rename the class and associated variables and comments.

WHY: This was triggered by strictdoc-project#2672. Rust language-aware parsing made
evident we want to represent many more things that can occur in source
code (enums, structs, modules, traits, expressions, type parameters,
...) - not only functions and classes. The possible list of items is
long and unpredictable since it depends on supported languages which
will increase over time. So it's best to represent items with a single
generic class.

HOW: Rename internal items only, without changing public API or
functionality. For example, we keep '@relation(scope=function)' and
'RELATIONS: - FUNCTIONS: ...' for now. The generalization should be
exposed to users eventually, but requires backward compatibility
considerations and therefore is deferred to not block other PRs.
haxtibal added a commit to haxtibal/strictdoc that referenced this pull request Feb 8, 2026
WHAT:
This allows language readers to provide an inferred default scope to the
marker parser.

WHY:
In some situations, the explicit scope in '@relation(REQ-1,
scope=function)' provided by the user would be redundant. A good example
are Rust doc comments, which unambiguously determine the targeted
language construct. The construct in turn determines the scope. If a
language reader has the ability to infer the scope, it should do so and
not urge the user to provide it explicitly. This was triggered by strictdoc-project#2672.

HOW:
We slightly change lark grammar for @relation to:
- at least one requirement uid,
- then maybe more separated requirement uids,
- then maybe separated scope,
- then maybe separated role.

The default scope is used if the marker contained no user provided
scope. We exit with error if neither default nor user provided scope
were given.
WHAT: Write up some wanted functionality for language-aware Rust parsing
in some detail.

WHY: Implementation shall be close to rules defined by rust-lang.org.
Detailed requirements help to not forget cases, remove ambiguities and
assemble test data and cases.
WHAT:
This adds Rust-support to StrictDoc's language aware parsing feature. It
notably supports Rust doc comment semantics to automatically bind
comments to language constructs. It supports line/range markers from
normal comments, custom tags from doc comments and forward relations by
qualified path for an opinionated set of identifiable items.

WHY:
Rust is a frequently used language in domains where also StrictDoc is
used (system programming, safety critical, security critical). See
related feature request strictdoc-project#2213.

HOW:
We have two alternative proposals for doing Rust parsing:

1) Use tree-sitter-rust Python module and reimplement some of the
   high-level Rust doc comment semantics on our own.
2) Use syn crate (has doc comments semantics built-in), use json to
   transfer results to Python world.

This implements 1) and can be used to study pros and cons. It uses
tree-sitter queries to match the parse tree for valid doc comment and
normal comment patterns, where captures record relevant information,
then python side post processing transforms captures to StrictDoc
structures.
@stanislaw stanislaw merged commit c234446 into strictdoc-project:main Feb 8, 2026
33 checks passed
@stanislaw stanislaw added this to the 2026-Q1 milestone Feb 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants