From 6b7b83e269d82034c7f7706d624e6decb90abb4b Mon Sep 17 00:00:00 2001 From: Ed Page Date: Thu, 9 Oct 2025 14:01:04 -0500 Subject: [PATCH 1/3] Revert "Reformat whitespace list for consistency" This reverts commit 60eb145d7e826d3e8dc8a7051fd5b5e5913070b7. This re-formats our Whitespace to be centered on Unicode's defintion. This makes it easy to compare with the standard and helps with Frontmatter. Unlike regular Rust, Frontmatter cares about the type of Whitespace. Even if we want to duplicate the definition, having them formatted similarly makes them easy to compare. --- src/whitespace.md | 37 ++++++++++++++++++++----------------- 1 file changed, 20 insertions(+), 17 deletions(-) diff --git a/src/whitespace.md b/src/whitespace.md index b398d0c958..761a1428e1 100644 --- a/src/whitespace.md +++ b/src/whitespace.md @@ -4,23 +4,26 @@ r[lex.whitespace] r[whitespace.syntax] ```grammar,lexer @root WHITESPACE -> - U+0009 // Horizontal tab, `'\t'` - | U+000A // Line feed, `'\n'` - | U+000B // Vertical tab - | U+000C // Form feed - | U+000D // Carriage return, `'\r'` - | U+0020 // Space, `' '` - | U+0085 // Next line - | U+200E // Left-to-right mark - | U+200F // Right-to-left mark - | U+2028 // Line separator - | U+2029 // Paragraph separator - -TAB -> U+0009 // Horizontal tab, `'\t'` - -LF -> U+000A // Line feed, `'\n'` - -CR -> U+000D // Carriage return, `'\r'` + // end of line + LF + | U+000B // vertical tabulation + | U+000C // form feed + | CR + | U+0085 // Unicode next line + | U+2028 // Unicode LINE SEPARATOR + | U+2029 // Unicode PARAGRAPH SEPARATOR + // Ignorable Code Point + | U+200E // Unicode LEFT-TO-RIGHT MARK + | U+200F // Unicode RIGHT-TO-LEFT MARK + // horizontal whitespace + | TAB + | U+0020 // space ' ' + +TAB -> U+0009 // horizontal tab ('\t') + +LF -> U+000A // line feed ('\n') + +CR -> U+000D // carriage return ('\r') ``` r[lex.whitespace.intro] From c564c4c9c812a3b1afe85dadffc486ad468fadc3 Mon Sep 17 00:00:00 2001 From: Ed Page Date: Thu, 9 Oct 2025 14:04:39 -0500 Subject: [PATCH 2/3] Split up WHITESPACE I'm splitting out `HORIZONTAL_WHITESPACE` to make it easier to connect the concept in frontmatter with `WHITESPACE`. In doing this, I found it awkward to only pull out part when the comments are there. I either needed a redundant comment to start a section, re-arrange out of order from Unicode, or split out all classes. I did the latter. --- src/whitespace.md | 42 ++++++++++++++++++++++++------------------ 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/src/whitespace.md b/src/whitespace.md index 761a1428e1..b274d611d5 100644 --- a/src/whitespace.md +++ b/src/whitespace.md @@ -4,26 +4,32 @@ r[lex.whitespace] r[whitespace.syntax] ```grammar,lexer @root WHITESPACE -> - // end of line - LF + END_OF_LINE + | IGNORABLE_CODE_POINT + | HORIZONTAL_WHITESPACE + +END_OF_LINE -> + U+000A // line feed, `'\n'` | U+000B // vertical tabulation | U+000C // form feed - | CR - | U+0085 // Unicode next line - | U+2028 // Unicode LINE SEPARATOR - | U+2029 // Unicode PARAGRAPH SEPARATOR - // Ignorable Code Point - | U+200E // Unicode LEFT-TO-RIGHT MARK - | U+200F // Unicode RIGHT-TO-LEFT MARK - // horizontal whitespace - | TAB - | U+0020 // space ' ' - -TAB -> U+0009 // horizontal tab ('\t') - -LF -> U+000A // line feed ('\n') - -CR -> U+000D // carriage return ('\r') + | U+000D // carriage return, `'\r'` + | U+0085 // next line + | U+2028 // LINE SEPARATOR + | U+2029 // PARAGRAPH SEPARATOR + +IGNORABLE_CODE_POINT -> + U+200E // LEFT-TO-RIGHT MARK + | U+200F // RIGHT-TO-LEFT MARK + +HORIZONTAL_WHITESPACE -> + U+0009 // horizontal tab, `'\t'` + | U+0020 // space, `' '` + +TAB -> U+0009 // horizontal tab, `'\t'` + +LF -> U+000A // line feed, `'\n'` + +CR -> U+000D // carriage return, `'\r'` ``` r[lex.whitespace.intro] From b849612c7ef0da67419913761fe193027ec97b6b Mon Sep 17 00:00:00 2001 From: Ed Page Date: Thu, 21 Aug 2025 16:23:45 -0500 Subject: [PATCH 3/3] docs(ref): Specify frontmatter --- src/SUMMARY.md | 1 + src/frontmatter.md | 53 ++++++++++++++++++++++++++++++++++++++++++++ src/input-format.md | 6 +++++ src/items/modules.md | 2 +- 4 files changed, 61 insertions(+), 1 deletion(-) create mode 100644 src/frontmatter.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index c3786707fa..c7e16ff2e3 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -6,6 +6,7 @@ - [Lexical structure](lexical-structure.md) - [Input format](input-format.md) + - [Frontmatter](frontmatter.md) - [Keywords](keywords.md) - [Identifiers](identifiers.md) - [Comments](comments.md) diff --git a/src/frontmatter.md b/src/frontmatter.md new file mode 100644 index 0000000000..90a72cc6b4 --- /dev/null +++ b/src/frontmatter.md @@ -0,0 +1,53 @@ +r[frontmatter] +# Frontmatter + +r[frontmatter.syntax] +```grammar,lexer +@root FRONTMATTER -> + FRONTMATTER_FENCE HORIZONTAL_WHITESPACE* INFOSTRING? HORIZONTAL_WHITESPACE* LF + (FRONTMATTER_LINE LF )* + FRONTMATTER_FENCE[^matched-fence] HORIZONTAL_WHITESPACE* LF + +FRONTMATTER_FENCE -> `---` `-`{..=255} + +INFOSTRING -> (XID_Start | `_`) ( XID_Continue | `-` | `.` )* + +FRONTMATTER_LINE -> (~INVALID_FRONTMATTER_LINE_START (~INVALID_FRONTMATTER_LINE_CONTINUE)*)? + +INVALID_FRONTMATTER_LINE_START -> (FRONTMATTER_FENCE[^escaped-fence] | LF) + +INVALID_FRONTMATTER_LINE_CONTINUE -> LF +``` + +[^matched-fence]: The closing fence must have the same number of `-` as the opening fence +[^escaped-fence]: A `FRONTMATTER_FENCE` at the beginning of a `FRONTMATTER_LINE` is only invalid if it has the same or more `-` as the `FRONTMATTER_FENCE` + +r[frontmatter.intro] +Frontmatter is an optional section for content intended for external tools without requiring these tools to have full knowledge of the Rust grammar. + +```rust +#!/usr/bin/env cargo +--- +[dependencies] +fastrand = "2" +--- + +fn main() { + let num = fastrand::i32(..); + println!("{num}"); +} +``` + +r[frontmatter.document] +Frontmatter may only be preceded by a [shebang] and whitespace. + +r[frontmatter.fence] +The delimiters are referred to as a *fence*. The opening and closing fences must be at the start of a line. They must be a matching pair of three or more hyphens (`-`). A fence may be followed by horizontal whitespace. + +r[frontmatter.infostring] +Following the opening fence may be an infostring for identifying the intention of the contained content. An infostring may be followed by horizontal whitespace. + +r[frontmatter.body] +The body of the frontmatter may contain any content except for a line starting with as many or more hyphens (`-`) than in the fences. + +[shebang]: input-format.md#shebang-removal diff --git a/src/input-format.md b/src/input-format.md index cf35b2959d..9d70088685 100644 --- a/src/input-format.md +++ b/src/input-format.md @@ -59,6 +59,11 @@ This prevents an [inner attribute] at the start of a source file being removed. > [!NOTE] > The standard library [`include!`] macro applies byte order mark removal, CRLF normalization, and shebang removal to the file it reads. The [`include_str!`] and [`include_bytes!`] macros do not. +r[input.frontmatter] +## Frontmatter removal + +After some whitespace, [frontmatter] may next appear in the input. + r[input.tokenization] ## Tokenization @@ -69,4 +74,5 @@ The resulting sequence of characters is then converted into tokens as described [comments]: comments.md [Crates and source files]: crates-and-source-files.md [_shebang_]: https://en.wikipedia.org/wiki/Shebang_(Unix) +[frontmatter]: frontmatter.md [whitespace]: whitespace.md diff --git a/src/items/modules.md b/src/items/modules.md index 46fb9957fa..76e1408684 100644 --- a/src/items/modules.md +++ b/src/items/modules.md @@ -152,7 +152,7 @@ r[items.mod.attributes] r[items.mod.attributes.intro] Modules, like all items, accept outer attributes. They also accept inner attributes: either after `{` for a module with a body, or at the beginning of the -source file, after the optional BOM and shebang. +source file, after the optional BOM, shebang, and frontmatter. r[items.mod.attributes.supported] The built-in attributes that have meaning on a module are [`cfg`],