99* Downloading grammar sources
1010* How it works
1111* Language injection
12+ * Wrapper grammars
13+ * Parse error highlighting
14+ * Syntax highlighting modes
1215* File layout
1316* Grammar configuration files
1417* Highlight query files (queries/*.scm)
@@ -85,11 +88,12 @@ The --with-tree-sitter-grammars option accepts a comma-separated list
8588of grammar names. The default is 'all' (154 grammars). Invalid names
8689cause configure to abort with an error listing all valid grammars.
8790
88- Binary size comparison (approximate):
89- - Without tree-sitter: ~5 MB
90- - Shared mode (any grammar count): ~5 MB (grammars in separate .so)
91- - Static mode, 3 grammars: ~8 MB
92- - Static mode, all 154 grammars: ~200 MB
91+ Binary size comparison:
92+ - Without tree-sitter: 5.0 MB
93+ - Shared mode (any grammar count): 5.1 MB (grammars in separate .so)
94+ - Static mode, 3 grammars: 7.7 MB
95+ - Static mode, 10 grammars: 10 MB
96+ - Static mode, all 63 grammars: 109 MB (for build validation only)
9397
9498To build with the legacy highlighting only (default):
9599
@@ -238,47 +242,184 @@ text (via @injection.language). If a matching grammar is available,
238242its content node is parsed with that grammar. Parsers and queries are
239243cached per language for efficiency.
240244
245+ Injections are recursive up to 3 levels deep. When an injected
246+ language has its own injections.scm file, those nested injections are
247+ also processed. For example, a Go template file wrapping Markdown
248+ can highlight fenced code blocks within the Markdown content:
249+ gotmpl -> markdown -> python (3 levels).
250+
251+
252+ Wrapper grammars
253+ ----------------
254+
255+ Wrapper grammars are template languages (like Go templates) that wrap
256+ a host language. Content outside the template syntax lives in
257+ specific AST nodes and can be highlighted by injecting the host
258+ grammar into those nodes.
259+
260+ Wrapper grammars are configured in the wrappers file
261+ (misc/syntax-ts/wrappers). Each line defines a wrapper:
262+
263+ wrapper_grammar content_node host1 host2 ...
264+
265+ wrapper_grammar The grammar name of the template language.
266+ content_node The AST node type that holds host content.
267+ host1 host2 ... Grammar names that this wrapper can wrap.
268+
269+ Example:
270+
271+ gotmpl text yaml json toml html xml markdown css
272+
273+ This enables two features:
274+
275+ 1. ERROR fallback: when a host grammar produces a catastrophic parse
276+ failure (ERROR root node), each wrapper that lists that host is
277+ tried as an alternative. If the wrapper parses successfully, the
278+ host grammar is injected into the wrapper's content nodes.
279+
280+ Example: a .yaml file containing Go template syntax ({{ }}) fails
281+ to parse as YAML. The system finds that gotmpl can wrap yaml,
282+ parses the file with gotmpl, and injects YAML highlighting into
283+ the text nodes. The result: Go template syntax is highlighted as
284+ gotmpl, and the YAML portions between templates get proper YAML
285+ highlighting.
286+
287+ 2. Compound extensions: for files like README.md.gotmpl, the inner
288+ extension (.md) identifies the host grammar, which is injected
289+ into the wrapper's content nodes automatically.
290+
291+ Example: README.md.gotmpl is matched as gotmpl by the .gotmpl
292+ extension. The system detects .md as a compound extension,
293+ resolves it to the markdown grammar, and injects Markdown
294+ highlighting into the gotmpl text nodes.
295+
296+ To add a new wrapper grammar, add a line to the wrappers file. No
297+ code changes are required. The wrapper grammar must have an AST node
298+ type that contains host language content (e.g. "text" for gotmpl).
299+
300+
301+ Parse error highlighting
302+ ------------------------
303+
304+ When a tree-sitter grammar produces ERROR nodes (parse failures),
305+ the affected regions are highlighted in red. This provides a visual
306+ indication that the parser could not understand parts of the file.
307+
308+ If the tree root does not cover the entire file (the parser gave up
309+ early), the uncovered portion is also highlighted in red.
310+
311+ Valid captures within ERROR regions take precedence over the red
312+ error coloring via the "narrower wins" rule: specific node captures
313+ are always narrower than the broad ERROR region.
314+
315+
316+ Syntax highlighting modes
317+ -------------------------
318+
319+ When compiled with tree-sitter support, the editor supports three
320+ highlighting modes:
321+
322+ - Tree-sitter (TS): AST-based highlighting using tree-sitter grammars.
323+ - Legacy: Regex-based highlighting using .syntax files.
324+ - None: Syntax highlighting disabled.
325+
326+ The active mode is shown in the status bar as S:[TS], S:[Legacy], or
327+ S:[None].
328+
329+ Ctrl+S cycles forward through modes: TS -> Legacy -> None -> TS.
330+ If tree-sitter initialization fails for a file (no grammar available),
331+ the mode automatically falls to Legacy and TS is excluded from
332+ cycling for that session.
333+
334+ Ctrl+T toggles directly between TS and Legacy (skips None). This is
335+ useful for quickly comparing tree-sitter and legacy highlighting.
336+ The same toggle is available from the Command menu as "Toggle TS/legacy
337+ syntax".
338+
339+ Manual syntax selection via Options -> Syntax highlighting works with
340+ tree-sitter. When the user selects a syntax type (e.g. "YAML"), the
341+ display name is reverse-looked up to a grammar name and tree-sitter
342+ is tried with that grammar. If tree-sitter fails, the legacy system
343+ handles the selected type.
344+
345+ The --no-tree-sitter command-line flag permanently disables tree-sitter
346+ for the entire session. When this flag is set, Ctrl+S cycles between
347+ Legacy and None only.
348+
241349
242350File layout
243351-----------
244352
245353Source files (src/editor/):
246354
247- syntax.c Main file. All tree-sitter code is inside
248- #ifdef HAVE_TREE_SITTER blocks. Contains:
355+ syntax_ts.c Tree-sitter highlighting implementation. Contains:
249356 - ts_input_read() -- TSInput callback
357+ - ts_load_color_config() -- loads colors.ini
250358 - ts_capture_name_to_color() -- color mapping
359+ - ts_config_lookup_by_value() -- config file lookup
360+ - ts_config_lookup_by_grammar() -- reverse lookup
361+ - ts_config_reverse_lookup() -- display name to
362+ grammar name lookup
251363 - ts_find_grammar() -- matches filename via config
252364 files (filenames > shebangs > extensions)
365+ - ts_find_wrapper_for_host() -- finds a wrapper
366+ grammar for a failed host grammar
367+ - ts_find_wrapper_content_node() -- gets the content
368+ node name for a wrapper grammar
369+ - ts_setup_wrapper_injection() -- builds injection
370+ query for wrapper grammars programmatically
253371 - ts_load_query_file() -- loads .scm file
254372 - ts_init_injections() -- sets up injection parsers
255373 from injections.scm query files
256374 - ts_get_dynamic_lang() -- lazy-loads dynamic grammar
257- - ts_run_dynamic_injection() -- code block injection
258- - ts_init_for_file() -- initialization
375+ - ts_inject_and_highlight() -- parses and highlights
376+ an injected language, with recursive injection
377+ support (up to TS_MAX_INJECTION_DEPTH levels)
378+ - ts_init_for_file() -- initialization (accepts
379+ optional forced_grammar for manual selection)
259380 - ts_free() -- cleanup (primary + injections)
260- - ts_collect_injection_ranges() -- collects byte ranges
261- - ts_append_node_range() -- range helper
262- - ts_run_query_into_highlights() -- runs query on tree
263- - ts_rebuild_highlight_cache() -- query cursor
381+ - ts_collect_error_highlights() -- collects ERROR
382+ nodes for red highlighting
383+ - ts_run_query_into_highlights() -- runs query on
384+ tree
385+ - ts_rebuild_highlight_cache() -- query cursor,
386+ injection processing, error highlighting
264387 - ts_get_color_at() -- linear scan
265- - edit_syntax_ts_notify_edit() -- incremental
266388 Conditional include: ts-grammar-registry.h (static
267389 mode) or ts-grammar-loader.h (shared mode).
268390
391+ syntax_ts.h Public API for tree-sitter integration:
392+ ts_init_for_file(), ts_free(), ts_get_color_at(),
393+ ts_rebuild_highlight_cache(),
394+ ts_config_reverse_lookup().
395+
396+ syntax.c Main syntax file. Tree-sitter integration points
397+ are inside #ifdef HAVE_TREE_SITTER blocks:
398+ - edit_load_syntax() calls ts_init_for_file() with
399+ optional forced_grammar for manual selection
400+ - edit_free_syntax_rules() calls ts_free()
401+ - edit_syntax_ts_notify_edit() -- incremental edit
402+ notification for tree re-parsing
403+
269404 ts-grammar-loader.h
270405 Shared mode grammar loader. Provides
271406 ts_grammar_registry_lookup() using g_module_open()
272407 to load grammar .so modules on demand. Caches
273408 loaded modules. Handles naming overrides (e.g.
274409 cobol -> tree_sitter_COBOL).
275410
411+ editdraw.c Status bar rendering includes the syntax
412+ highlighting mode indicator (S:[TS], S:[Legacy],
413+ S:[None]) in both simple and normal status bar
414+ formats.
415+
276416 editwidget.h WEdit struct extended with tree-sitter fields:
277417 Primary: ts_parser, ts_tree, ts_highlight_query
278418 (void*), ts_highlights (GArray*),
279- ts_highlights_start/end, ts_active, ts_need_reparse.
280- Injections: ts_injections (GArray* of
281- ts_injection_t, supports multiple and dynamic).
419+ ts_highlights_start/end, ts_grammar_name, ts_active,
420+ ts_need_reparse.
421+ Injections: ts_injection_query (TSQuery*),
422+ ts_injection_lang_cache (GHashTable*).
282423
283424 edit-impl.h Declaration of edit_syntax_ts_notify_edit().
284425
@@ -319,11 +460,20 @@ Runtime data files (misc/syntax-ts/):
319460 display-names Maps grammar names to human-readable display names.
320461 symbols Overrides for non-standard tree_sitter_*() function
321462 names (e.g. cobol -> tree_sitter_COBOL).
322- colors Default color mappings for tree-sitter capture names.
463+ wrappers Wrapper grammar definitions for template languages.
464+ Maps wrapper grammars to their content nodes and
465+ supported host languages (see "Wrapper grammars").
466+ colors.ini Per-grammar color mappings (INI format, sections
467+ named [grammar_name]).
468+ queries-override/<name>-highlights.scm
469+ MC-specific highlight query overrides (take
470+ precedence over upstream queries).
323471 queries/<name>-highlights.scm
324- Highlight query files, one per grammar.
472+ Upstream highlight query files, one per grammar.
473+ queries-override/<name>-injections.scm
474+ MC-specific injection query overrides.
325475 queries/<name>-injections.scm
326- Injection query files for language injection .
476+ Upstream injection query files.
327477
328478Shared grammar modules (shared mode only):
329479
@@ -414,16 +564,31 @@ symbols -- overrides for non-standard function names:
414564
415565 Example: cobol COBOL (function: tree_sitter_COBOL)
416566
417- colors -- default color mappings for capture names :
567+ wrappers -- wrapper grammar definitions :
418568
419- <capture_name> <foreground>;<background>
569+ <wrapper_grammar> <content_node> <host1> <host2> ...
570+
571+ Defines a wrapper grammar (template language) that can wrap host
572+ languages. See the "Wrapper grammars" section for details.
573+
574+ Example: gotmpl text yaml json toml html xml markdown css
575+
576+ colors.ini -- color mappings for capture names (INI format):
577+
578+ [default]
579+ keyword = yellow;
580+ string = green;
581+
582+ [python]
583+ variable.builtin = brightred;
584+
585+ A [default] section provides global defaults. Per-grammar sections
586+ override specific captures. The format is:
587+ <capture_name> = <foreground>;<background>
420588
421589 Background can be omitted (inherits default). A foreground of "-"
422590 means use the default foreground color.
423591
424- Example: keyword yellow;
425- Example: string green;
426-
427592Lookup precedence when opening a file:
4285931. filenames -- exact basename match (highest priority)
4295942. shebangs -- interpreter from first line
@@ -437,12 +602,21 @@ share directory.
437602Highlight query files (queries/*.scm)
438603--------------------------------------
439604
440- Each grammar has a corresponding highlight query file in
441- misc/syntax-ts/queries/ named <grammar_name>-highlights.scm (e.g.
442- c-highlights.scm, python-highlights.scm, bash-highlights.scm).
605+ Each grammar has a corresponding highlight query file named
606+ <grammar_name>-highlights.scm. Query files are searched in this order:
607+
608+ 1. User override: ~/.local/share/mc/syntax-ts/queries-override/
609+ 2. System override: $(datadir)/mc/syntax-ts/queries-override/
610+ 3. User upstream: ~/.local/share/mc/syntax-ts/queries/
611+ 4. System upstream: $(datadir)/mc/syntax-ts/queries/
612+
613+ The first file found is used. MC-specific overrides in queries-override/
614+ take precedence over upstream queries in queries/. This allows tuning
615+ capture names and patterns for MC's color scheme without modifying
616+ upstream query files.
617+
443618Grammars that support language injection also have an injection query
444- file named <grammar_name>-injections.scm (e.g. html-injections.scm,
445- markdown-injections.scm).
619+ file named <grammar_name>-injections.scm (same search order).
446620
447621Query files use the tree-sitter query syntax (S-expressions) to match
448622AST node patterns and assign capture names. For example:
@@ -472,9 +646,11 @@ Query files support hierarchical capture names (e.g. @keyword.control,
472646Color mapping
473647-------------
474648
475- The colors config file (misc/syntax-ts/colors) maps capture names to
476- MC foreground colors. The colors are chosen to match MC's default
477- syntax highlighting appearance (blue background skin):
649+ The colors config file (misc/syntax-ts/colors.ini) maps capture names
650+ to MC foreground colors using INI format. A [default] section provides
651+ global defaults; per-grammar sections (e.g. [python], [bash]) override
652+ specific captures for that grammar. Colors are chosen to match MC's
653+ default syntax highlighting appearance (blue background skin):
478654
479655 Capture Foreground Color Purpose
480656 ------- ---------------- -------
0 commit comments