Skip to content

Commit fdb0ada

Browse files
committed
Syntax matching
1 parent c54dd21 commit fdb0ada

165 files changed

Lines changed: 16515 additions & 148 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

INSTALL

Lines changed: 0 additions & 1 deletion
This file was deleted.

INSTALL

Lines changed: 380 additions & 0 deletions
Large diffs are not rendered by default.

doc/TREE-SITTER

Lines changed: 209 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,9 @@ Contents
99
* Downloading grammar sources
1010
* How it works
1111
* Language injection
12+
* Wrapper grammars
13+
* Parse error highlighting
14+
* Syntax highlighting modes
1215
* File layout
1316
* Grammar configuration files
1417
* Highlight query files (queries/*.scm)
@@ -85,11 +88,12 @@ The --with-tree-sitter-grammars option accepts a comma-separated list
8588
of grammar names. The default is 'all' (154 grammars). Invalid names
8689
cause configure to abort with an error listing all valid grammars.
8790

88-
Binary size comparison (approximate):
89-
- Without tree-sitter: ~5 MB
90-
- Shared mode (any grammar count): ~5 MB (grammars in separate .so)
91-
- Static mode, 3 grammars: ~8 MB
92-
- Static mode, all 154 grammars: ~200 MB
91+
Binary size comparison:
92+
- Without tree-sitter: 5.0 MB
93+
- Shared mode (any grammar count): 5.1 MB (grammars in separate .so)
94+
- Static mode, 3 grammars: 7.7 MB
95+
- Static mode, 10 grammars: 10 MB
96+
- Static mode, all 63 grammars: 109 MB (for build validation only)
9397

9498
To build with the legacy highlighting only (default):
9599

@@ -238,47 +242,184 @@ text (via @injection.language). If a matching grammar is available,
238242
its content node is parsed with that grammar. Parsers and queries are
239243
cached per language for efficiency.
240244

245+
Injections are recursive up to 3 levels deep. When an injected
246+
language has its own injections.scm file, those nested injections are
247+
also processed. For example, a Go template file wrapping Markdown
248+
can highlight fenced code blocks within the Markdown content:
249+
gotmpl -> markdown -> python (3 levels).
250+
251+
252+
Wrapper grammars
253+
----------------
254+
255+
Wrapper grammars are template languages (like Go templates) that wrap
256+
a host language. Content outside the template syntax lives in
257+
specific AST nodes and can be highlighted by injecting the host
258+
grammar into those nodes.
259+
260+
Wrapper grammars are configured in the wrappers file
261+
(misc/syntax-ts/wrappers). Each line defines a wrapper:
262+
263+
wrapper_grammar content_node host1 host2 ...
264+
265+
wrapper_grammar The grammar name of the template language.
266+
content_node The AST node type that holds host content.
267+
host1 host2 ... Grammar names that this wrapper can wrap.
268+
269+
Example:
270+
271+
gotmpl text yaml json toml html xml markdown css
272+
273+
This enables two features:
274+
275+
1. ERROR fallback: when a host grammar produces a catastrophic parse
276+
failure (ERROR root node), each wrapper that lists that host is
277+
tried as an alternative. If the wrapper parses successfully, the
278+
host grammar is injected into the wrapper's content nodes.
279+
280+
Example: a .yaml file containing Go template syntax ({{ }}) fails
281+
to parse as YAML. The system finds that gotmpl can wrap yaml,
282+
parses the file with gotmpl, and injects YAML highlighting into
283+
the text nodes. The result: Go template syntax is highlighted as
284+
gotmpl, and the YAML portions between templates get proper YAML
285+
highlighting.
286+
287+
2. Compound extensions: for files like README.md.gotmpl, the inner
288+
extension (.md) identifies the host grammar, which is injected
289+
into the wrapper's content nodes automatically.
290+
291+
Example: README.md.gotmpl is matched as gotmpl by the .gotmpl
292+
extension. The system detects .md as a compound extension,
293+
resolves it to the markdown grammar, and injects Markdown
294+
highlighting into the gotmpl text nodes.
295+
296+
To add a new wrapper grammar, add a line to the wrappers file. No
297+
code changes are required. The wrapper grammar must have an AST node
298+
type that contains host language content (e.g. "text" for gotmpl).
299+
300+
301+
Parse error highlighting
302+
------------------------
303+
304+
When a tree-sitter grammar produces ERROR nodes (parse failures),
305+
the affected regions are highlighted in red. This provides a visual
306+
indication that the parser could not understand parts of the file.
307+
308+
If the tree root does not cover the entire file (the parser gave up
309+
early), the uncovered portion is also highlighted in red.
310+
311+
Valid captures within ERROR regions take precedence over the red
312+
error coloring via the "narrower wins" rule: specific node captures
313+
are always narrower than the broad ERROR region.
314+
315+
316+
Syntax highlighting modes
317+
-------------------------
318+
319+
When compiled with tree-sitter support, the editor supports three
320+
highlighting modes:
321+
322+
- Tree-sitter (TS): AST-based highlighting using tree-sitter grammars.
323+
- Legacy: Regex-based highlighting using .syntax files.
324+
- None: Syntax highlighting disabled.
325+
326+
The active mode is shown in the status bar as S:[TS], S:[Legacy], or
327+
S:[None].
328+
329+
Ctrl+S cycles forward through modes: TS -> Legacy -> None -> TS.
330+
If tree-sitter initialization fails for a file (no grammar available),
331+
the mode automatically falls to Legacy and TS is excluded from
332+
cycling for that session.
333+
334+
Ctrl+T toggles directly between TS and Legacy (skips None). This is
335+
useful for quickly comparing tree-sitter and legacy highlighting.
336+
The same toggle is available from the Command menu as "Toggle TS/legacy
337+
syntax".
338+
339+
Manual syntax selection via Options -> Syntax highlighting works with
340+
tree-sitter. When the user selects a syntax type (e.g. "YAML"), the
341+
display name is reverse-looked up to a grammar name and tree-sitter
342+
is tried with that grammar. If tree-sitter fails, the legacy system
343+
handles the selected type.
344+
345+
The --no-tree-sitter command-line flag permanently disables tree-sitter
346+
for the entire session. When this flag is set, Ctrl+S cycles between
347+
Legacy and None only.
348+
241349

242350
File layout
243351
-----------
244352

245353
Source files (src/editor/):
246354

247-
syntax.c Main file. All tree-sitter code is inside
248-
#ifdef HAVE_TREE_SITTER blocks. Contains:
355+
syntax_ts.c Tree-sitter highlighting implementation. Contains:
249356
- ts_input_read() -- TSInput callback
357+
- ts_load_color_config() -- loads colors.ini
250358
- ts_capture_name_to_color() -- color mapping
359+
- ts_config_lookup_by_value() -- config file lookup
360+
- ts_config_lookup_by_grammar() -- reverse lookup
361+
- ts_config_reverse_lookup() -- display name to
362+
grammar name lookup
251363
- ts_find_grammar() -- matches filename via config
252364
files (filenames > shebangs > extensions)
365+
- ts_find_wrapper_for_host() -- finds a wrapper
366+
grammar for a failed host grammar
367+
- ts_find_wrapper_content_node() -- gets the content
368+
node name for a wrapper grammar
369+
- ts_setup_wrapper_injection() -- builds injection
370+
query for wrapper grammars programmatically
253371
- ts_load_query_file() -- loads .scm file
254372
- ts_init_injections() -- sets up injection parsers
255373
from injections.scm query files
256374
- ts_get_dynamic_lang() -- lazy-loads dynamic grammar
257-
- ts_run_dynamic_injection() -- code block injection
258-
- ts_init_for_file() -- initialization
375+
- ts_inject_and_highlight() -- parses and highlights
376+
an injected language, with recursive injection
377+
support (up to TS_MAX_INJECTION_DEPTH levels)
378+
- ts_init_for_file() -- initialization (accepts
379+
optional forced_grammar for manual selection)
259380
- ts_free() -- cleanup (primary + injections)
260-
- ts_collect_injection_ranges() -- collects byte ranges
261-
- ts_append_node_range() -- range helper
262-
- ts_run_query_into_highlights() -- runs query on tree
263-
- ts_rebuild_highlight_cache() -- query cursor
381+
- ts_collect_error_highlights() -- collects ERROR
382+
nodes for red highlighting
383+
- ts_run_query_into_highlights() -- runs query on
384+
tree
385+
- ts_rebuild_highlight_cache() -- query cursor,
386+
injection processing, error highlighting
264387
- ts_get_color_at() -- linear scan
265-
- edit_syntax_ts_notify_edit() -- incremental
266388
Conditional include: ts-grammar-registry.h (static
267389
mode) or ts-grammar-loader.h (shared mode).
268390

391+
syntax_ts.h Public API for tree-sitter integration:
392+
ts_init_for_file(), ts_free(), ts_get_color_at(),
393+
ts_rebuild_highlight_cache(),
394+
ts_config_reverse_lookup().
395+
396+
syntax.c Main syntax file. Tree-sitter integration points
397+
are inside #ifdef HAVE_TREE_SITTER blocks:
398+
- edit_load_syntax() calls ts_init_for_file() with
399+
optional forced_grammar for manual selection
400+
- edit_free_syntax_rules() calls ts_free()
401+
- edit_syntax_ts_notify_edit() -- incremental edit
402+
notification for tree re-parsing
403+
269404
ts-grammar-loader.h
270405
Shared mode grammar loader. Provides
271406
ts_grammar_registry_lookup() using g_module_open()
272407
to load grammar .so modules on demand. Caches
273408
loaded modules. Handles naming overrides (e.g.
274409
cobol -> tree_sitter_COBOL).
275410

411+
editdraw.c Status bar rendering includes the syntax
412+
highlighting mode indicator (S:[TS], S:[Legacy],
413+
S:[None]) in both simple and normal status bar
414+
formats.
415+
276416
editwidget.h WEdit struct extended with tree-sitter fields:
277417
Primary: ts_parser, ts_tree, ts_highlight_query
278418
(void*), ts_highlights (GArray*),
279-
ts_highlights_start/end, ts_active, ts_need_reparse.
280-
Injections: ts_injections (GArray* of
281-
ts_injection_t, supports multiple and dynamic).
419+
ts_highlights_start/end, ts_grammar_name, ts_active,
420+
ts_need_reparse.
421+
Injections: ts_injection_query (TSQuery*),
422+
ts_injection_lang_cache (GHashTable*).
282423

283424
edit-impl.h Declaration of edit_syntax_ts_notify_edit().
284425

@@ -319,11 +460,20 @@ Runtime data files (misc/syntax-ts/):
319460
display-names Maps grammar names to human-readable display names.
320461
symbols Overrides for non-standard tree_sitter_*() function
321462
names (e.g. cobol -> tree_sitter_COBOL).
322-
colors Default color mappings for tree-sitter capture names.
463+
wrappers Wrapper grammar definitions for template languages.
464+
Maps wrapper grammars to their content nodes and
465+
supported host languages (see "Wrapper grammars").
466+
colors.ini Per-grammar color mappings (INI format, sections
467+
named [grammar_name]).
468+
queries-override/<name>-highlights.scm
469+
MC-specific highlight query overrides (take
470+
precedence over upstream queries).
323471
queries/<name>-highlights.scm
324-
Highlight query files, one per grammar.
472+
Upstream highlight query files, one per grammar.
473+
queries-override/<name>-injections.scm
474+
MC-specific injection query overrides.
325475
queries/<name>-injections.scm
326-
Injection query files for language injection.
476+
Upstream injection query files.
327477

328478
Shared grammar modules (shared mode only):
329479

@@ -414,16 +564,31 @@ symbols -- overrides for non-standard function names:
414564

415565
Example: cobol COBOL (function: tree_sitter_COBOL)
416566

417-
colors -- default color mappings for capture names:
567+
wrappers -- wrapper grammar definitions:
418568

419-
<capture_name> <foreground>;<background>
569+
<wrapper_grammar> <content_node> <host1> <host2> ...
570+
571+
Defines a wrapper grammar (template language) that can wrap host
572+
languages. See the "Wrapper grammars" section for details.
573+
574+
Example: gotmpl text yaml json toml html xml markdown css
575+
576+
colors.ini -- color mappings for capture names (INI format):
577+
578+
[default]
579+
keyword = yellow;
580+
string = green;
581+
582+
[python]
583+
variable.builtin = brightred;
584+
585+
A [default] section provides global defaults. Per-grammar sections
586+
override specific captures. The format is:
587+
<capture_name> = <foreground>;<background>
420588

421589
Background can be omitted (inherits default). A foreground of "-"
422590
means use the default foreground color.
423591

424-
Example: keyword yellow;
425-
Example: string green;
426-
427592
Lookup precedence when opening a file:
428593
1. filenames -- exact basename match (highest priority)
429594
2. shebangs -- interpreter from first line
@@ -437,12 +602,21 @@ share directory.
437602
Highlight query files (queries/*.scm)
438603
--------------------------------------
439604

440-
Each grammar has a corresponding highlight query file in
441-
misc/syntax-ts/queries/ named <grammar_name>-highlights.scm (e.g.
442-
c-highlights.scm, python-highlights.scm, bash-highlights.scm).
605+
Each grammar has a corresponding highlight query file named
606+
<grammar_name>-highlights.scm. Query files are searched in this order:
607+
608+
1. User override: ~/.local/share/mc/syntax-ts/queries-override/
609+
2. System override: $(datadir)/mc/syntax-ts/queries-override/
610+
3. User upstream: ~/.local/share/mc/syntax-ts/queries/
611+
4. System upstream: $(datadir)/mc/syntax-ts/queries/
612+
613+
The first file found is used. MC-specific overrides in queries-override/
614+
take precedence over upstream queries in queries/. This allows tuning
615+
capture names and patterns for MC's color scheme without modifying
616+
upstream query files.
617+
443618
Grammars that support language injection also have an injection query
444-
file named <grammar_name>-injections.scm (e.g. html-injections.scm,
445-
markdown-injections.scm).
619+
file named <grammar_name>-injections.scm (same search order).
446620

447621
Query files use the tree-sitter query syntax (S-expressions) to match
448622
AST node patterns and assign capture names. For example:
@@ -472,9 +646,11 @@ Query files support hierarchical capture names (e.g. @keyword.control,
472646
Color mapping
473647
-------------
474648

475-
The colors config file (misc/syntax-ts/colors) maps capture names to
476-
MC foreground colors. The colors are chosen to match MC's default
477-
syntax highlighting appearance (blue background skin):
649+
The colors config file (misc/syntax-ts/colors.ini) maps capture names
650+
to MC foreground colors using INI format. A [default] section provides
651+
global defaults; per-grammar sections (e.g. [python], [bash]) override
652+
specific captures for that grammar. Colors are chosen to match MC's
653+
default syntax highlighting appearance (blue background skin):
478654

479655
Capture Foreground Color Purpose
480656
------- ---------------- -------

lib/keybind.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,7 @@ static name_keymap_t command_names[] = {
330330
ADD_KEYMAP_NAME (InsertLiteral),
331331
ADD_KEYMAP_NAME (ShowTabTws),
332332
ADD_KEYMAP_NAME (SyntaxOnOff),
333+
ADD_KEYMAP_NAME (SyntaxToggleTS),
333334
ADD_KEYMAP_NAME (SyntaxChoose),
334335
ADD_KEYMAP_NAME (ShowMargin),
335336
ADD_KEYMAP_NAME (OptionsSaveMode),

lib/keybind.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -309,6 +309,7 @@ enum
309309
CK_ShowMargin,
310310
CK_ShowTabTws,
311311
CK_SyntaxOnOff,
312+
CK_SyntaxToggleTS,
312313
CK_SyntaxChoose,
313314
CK_InsertLiteral,
314315
CK_ExternalCommand,

m4.include/mc-with-tree-sitter.m4

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ AC_DEFUN([mc_WITH_TREE_SITTER], [
5151
all_ts_grammars="ada asm awk bash bison c caddy cmake cobol cpp c_sharp css cuda d diff dockerfile dot erlang fortran glsl go haskell hcl html idl ini java javascript json kotlin lisp lua make markdown markdown_inline matlab meson muttrc ocaml pascal perl php po properties proto python qmljs r ruby rust scala smalltalk sql strace swift tcl toml turtle typescript verilog vhdl xml yaml"
5252
5353
dnl Grammars that have scanner.c
54-
ts_scanner_c="awk bash bison caddy cmake cobol cpp c_sharp css cuda d dockerfile fortran haskell hcl html javascript kotlin lua markdown markdown_inline matlab ocaml perl php properties python qmljs r ruby rust scala tcl toml typescript xml yaml"
54+
ts_scanner_c="awk bash bison caddy cmake cobol cpp c_sharp css cuda d dockerfile erlang fortran haskell hcl html javascript kotlin lua markdown markdown_inline ocaml perl php properties python qmljs r ruby rust scala swift tcl toml typescript xml yaml"
5555
5656
dnl Grammars that have scanner.cc (C++ scanner)
5757
ts_scanner_cc="sql"

misc/mc.default.keymap

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,7 @@ MacroStartStopRecord = ctrl-r
369369
ShowNumbers = alt-n
370370
ShowTabTws = alt-underline
371371
SyntaxOnOff = ctrl-s
372+
SyntaxToggleTS = ctrl-t
372373
# SyntaxChoose =
373374
# ShowMargin =
374375
Find = alt-enter

misc/mc.emacs.keymap

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -368,6 +368,7 @@ MacroStartStopRecord = ctrl-r
368368
ShowNumbers = alt-n
369369
ShowTabTws = alt-underline
370370
SyntaxOnOff = ctrl-s
371+
SyntaxToggleTS = ctrl-t
371372
# SyntaxChoose =
372373
# ShowMargin =
373374
Find = alt-enter

0 commit comments

Comments
 (0)