diff --git a/src/SUMMARY.md b/src/SUMMARY.md index c3786707fa..c7e16ff2e3 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -6,6 +6,7 @@ - [Lexical structure](lexical-structure.md) - [Input format](input-format.md) + - [Frontmatter](frontmatter.md) - [Keywords](keywords.md) - [Identifiers](identifiers.md) - [Comments](comments.md) diff --git a/src/frontmatter.md b/src/frontmatter.md new file mode 100644 index 0000000000..90a72cc6b4 --- /dev/null +++ b/src/frontmatter.md @@ -0,0 +1,53 @@ +r[frontmatter] +# Frontmatter + +r[frontmatter.syntax] +```grammar,lexer +@root FRONTMATTER -> + FRONTMATTER_FENCE HORIZONTAL_WHITESPACE* INFOSTRING? HORIZONTAL_WHITESPACE* LF + (FRONTMATTER_LINE LF )* + FRONTMATTER_FENCE[^matched-fence] HORIZONTAL_WHITESPACE* LF + +FRONTMATTER_FENCE -> `---` `-`{..=255} + +INFOSTRING -> (XID_Start | `_`) ( XID_Continue | `-` | `.` )* + +FRONTMATTER_LINE -> (~INVALID_FRONTMATTER_LINE_START (~INVALID_FRONTMATTER_LINE_CONTINUE)*)? + +INVALID_FRONTMATTER_LINE_START -> (FRONTMATTER_FENCE[^escaped-fence] | LF) + +INVALID_FRONTMATTER_LINE_CONTINUE -> LF +``` + +[^matched-fence]: The closing fence must have the same number of `-` as the opening fence +[^escaped-fence]: A `FRONTMATTER_FENCE` at the beginning of a `FRONTMATTER_LINE` is only invalid if it has the same or more `-` as the `FRONTMATTER_FENCE` + +r[frontmatter.intro] +Frontmatter is an optional section for content intended for external tools without requiring these tools to have full knowledge of the Rust grammar. + +```rust +#!/usr/bin/env cargo +--- +[dependencies] +fastrand = "2" +--- + +fn main() { + let num = fastrand::i32(..); + println!("{num}"); +} +``` + +r[frontmatter.document] +Frontmatter may only be preceded by a [shebang] and whitespace. + +r[frontmatter.fence] +The delimiters are referred to as a *fence*. The opening and closing fences must be at the start of a line. They must be a matching pair of three or more hyphens (`-`). A fence may be followed by horizontal whitespace. + +r[frontmatter.infostring] +Following the opening fence may be an infostring for identifying the intention of the contained content. An infostring may be followed by horizontal whitespace. + +r[frontmatter.body] +The body of the frontmatter may contain any content except for a line starting with as many or more hyphens (`-`) than in the fences. + +[shebang]: input-format.md#shebang-removal diff --git a/src/input-format.md b/src/input-format.md index cf35b2959d..9d70088685 100644 --- a/src/input-format.md +++ b/src/input-format.md @@ -59,6 +59,11 @@ This prevents an [inner attribute] at the start of a source file being removed. > [!NOTE] > The standard library [`include!`] macro applies byte order mark removal, CRLF normalization, and shebang removal to the file it reads. The [`include_str!`] and [`include_bytes!`] macros do not. +r[input.frontmatter] +## Frontmatter removal + +After some whitespace, [frontmatter] may next appear in the input. + r[input.tokenization] ## Tokenization @@ -69,4 +74,5 @@ The resulting sequence of characters is then converted into tokens as described [comments]: comments.md [Crates and source files]: crates-and-source-files.md [_shebang_]: https://en.wikipedia.org/wiki/Shebang_(Unix) +[frontmatter]: frontmatter.md [whitespace]: whitespace.md diff --git a/src/items/modules.md b/src/items/modules.md index 46fb9957fa..76e1408684 100644 --- a/src/items/modules.md +++ b/src/items/modules.md @@ -152,7 +152,7 @@ r[items.mod.attributes] r[items.mod.attributes.intro] Modules, like all items, accept outer attributes. They also accept inner attributes: either after `{` for a module with a body, or at the beginning of the -source file, after the optional BOM and shebang. +source file, after the optional BOM, shebang, and frontmatter. r[items.mod.attributes.supported] The built-in attributes that have meaning on a module are [`cfg`], diff --git a/src/whitespace.md b/src/whitespace.md index b398d0c958..b274d611d5 100644 --- a/src/whitespace.md +++ b/src/whitespace.md @@ -4,23 +4,32 @@ r[lex.whitespace] r[whitespace.syntax] ```grammar,lexer @root WHITESPACE -> - U+0009 // Horizontal tab, `'\t'` - | U+000A // Line feed, `'\n'` - | U+000B // Vertical tab - | U+000C // Form feed - | U+000D // Carriage return, `'\r'` - | U+0020 // Space, `' '` - | U+0085 // Next line - | U+200E // Left-to-right mark - | U+200F // Right-to-left mark - | U+2028 // Line separator - | U+2029 // Paragraph separator - -TAB -> U+0009 // Horizontal tab, `'\t'` - -LF -> U+000A // Line feed, `'\n'` - -CR -> U+000D // Carriage return, `'\r'` + END_OF_LINE + | IGNORABLE_CODE_POINT + | HORIZONTAL_WHITESPACE + +END_OF_LINE -> + U+000A // line feed, `'\n'` + | U+000B // vertical tabulation + | U+000C // form feed + | U+000D // carriage return, `'\r'` + | U+0085 // next line + | U+2028 // LINE SEPARATOR + | U+2029 // PARAGRAPH SEPARATOR + +IGNORABLE_CODE_POINT -> + U+200E // LEFT-TO-RIGHT MARK + | U+200F // RIGHT-TO-LEFT MARK + +HORIZONTAL_WHITESPACE -> + U+0009 // horizontal tab, `'\t'` + | U+0020 // space, `' '` + +TAB -> U+0009 // horizontal tab, `'\t'` + +LF -> U+000A // line feed, `'\n'` + +CR -> U+000D // carriage return, `'\r'` ``` r[lex.whitespace.intro]