Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
821d9b2
tests: add whitespace tests for vertical tab behavior
Brace1000 Apr 9, 2026
1609baf
tests: add ignore-tidy-tab directive to whitespace tests
Brace1000 Apr 9, 2026
7c00dc2
tests: expand vertical tab lexer test to cover all Pattern_White_Spac…
Brace1000 Apr 11, 2026
1546b1d
tests: add whitespace/ README entry explaining lexer vs stdlib mismatch
Brace1000 Apr 11, 2026
5b487fc
Update ascii_whitespace_excludes_vertical_tab.rs
Brace1000 Apr 9, 2026
91fd464
Update ascii_whitespace_excludes_vertical_tab.rs
Brace1000 Apr 9, 2026
874a618
Update ascii_whitespace_excludes_vertical_tab.rs
Brace1000 Apr 9, 2026
b727680
fix tidy: add whitespace README entry
Brace1000 Apr 9, 2026
52f1d1e
Update README.md with missing full stop
Brace1000 Apr 9, 2026
c5aea24
Update ascii_whitespace_excludes_vertical_tab.rs
Brace1000 Apr 9, 2026
f219e91
fix tidy: use full path format for whitespace README entry
Brace1000 Apr 11, 2026
87fcb28
fix tidy: README order, trailing newlines in whitespace tests
Brace1000 Apr 11, 2026
47fb045
fix: add run-pass directive and restore embedded whitespace bytes
Brace1000 Apr 11, 2026
f06914b
fix tidy: remove duplicate whitespace README entry
Brace1000 Apr 11, 2026
7027a64
Brace1000 Apr 14, 2026
4d8a428
git add tests/ui/whitespace/invalid_whitespace.rs
Brace1000 Apr 14, 2026
93b13d3
Fix tidy: add trailing newline
Brace1000 Apr 14, 2026
7e47ea6
Update invalid_whitespace.rs
Brace1000 Apr 14, 2026
00a37bb
Update invalid_whitespace.rs
Brace1000 Apr 14, 2026
a2e128a
Clean up whitespace in invalid_whitespace.rs
Brace1000 Apr 14, 2026
233f744
Update invalid_whitespace.rs
Brace1000 Apr 14, 2026
b32995b
Clarify ZERO WIDTH SPACE usage in test
Brace1000 Apr 14, 2026
ead2b71
Improve error messages for invalid whitespace
Brace1000 Apr 14, 2026
d0bc9e4
Modify invalid_whitespace test for clarity
Brace1000 Apr 14, 2026
2f981ce
Resolve unknown token error in invalid_whitespace.rs
Brace1000 Apr 14, 2026
3d1ad29
Remove invisible character from variable assignment
Brace1000 Apr 14, 2026
2506ce4
Improve error message for invalid whitespace
Brace1000 Apr 14, 2026
f1eb5e7
Improve error handling for invisible characters
Brace1000 Apr 14, 2026
16b2655
Document error for unknown token due to whitespace
Brace1000 Apr 14, 2026
701bc97
Update error message for invalid whitespace handling
Brace1000 Apr 14, 2026
ece7316
Modify invalid_whitespace.rs for whitespace checks
Brace1000 Apr 14, 2026
a1eb231
Correct whitespace in variable declaration
Brace1000 Apr 14, 2026
6e459b9
Update error message for invalid whitespace
Brace1000 Apr 14, 2026
dc0d44a
Update invalid_whitespace.stderr
Brace1000 Apr 14, 2026
5661524
Refine error handling for invalid whitespace test
Brace1000 Apr 14, 2026
523f70a
Update invalid_whitespace.rs
Brace1000 Apr 14, 2026
1db9763
Fix whitespace issues in invalid_whitespace.rs
Brace1000 Apr 14, 2026
52225e6
Update invalid_whitespace.stderr file
Brace1000 Apr 15, 2026
185a582
Clean up whitespace in invalid_whitespace.rs
Brace1000 Apr 15, 2026
43f045c
Update invalid_whitespace.stderr
Brace1000 Apr 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions tests/ui/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1582,6 +1582,21 @@ Tests on various well-formedness checks, e.g. [Type-checking normal functions](h

Tests on `where` clauses. See [Where clauses | Reference](https://doc.rust-lang.org/reference/items/generics.html#where-clauses).

## `tests/ui/whitespace/`

Tests for whitespace handling in the Rust lexer. The Rust language
defines whitespace as Unicode Pattern_White_Space, which is not the
same as what the standard library gives you:

- `is_ascii_whitespace` follows the WhatWG Infra Standard and skips
vertical tab (`\x0B`)
- `is_whitespace` matches Unicode White_Space, which is a broader set

These tests make that gap visible and check that the lexer accepts
all 11 Pattern_White_Space characters correctly.

See: https://github.com/rustfoundation/interop-initiative/issues/53

## `tests/ui/windows-subsystem/`: `#![windows_subsystem = ""]`

See [the `windows_subsystem` attribute](https://doc.rust-lang.org/reference/runtime.html#the-windows_subsystem-attribute).
Expand Down
22 changes: 22 additions & 0 deletions tests/ui/whitespace/ascii_whitespace_excludes_vertical_tab.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
//@ run-pass
// This test checks that split_ascii_whitespace does NOT split on
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this test is relevant to the compiler?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. The test documents the gap between what the lexer accepts and what the stdlib gives you. Happy to remove it if you think it doesn't belong here.

// vertical tab (\x0B), because the standard library uses the WhatWG
// Infra Standard definition of ASCII whitespace, which excludes
// vertical tab.
//
// See: https://github.com/rust-lang/rust-project-goals/issues/53

fn main() {
let s = "a\x0Bb";

let parts: Vec<&str> = s.split_ascii_whitespace().collect();

assert_eq!(parts.len(), 1,
"vertical tab should not be treated as ASCII whitespace");

let s2 = "a b";
let parts2: Vec<&str> = s2.split_ascii_whitespace().collect();
assert_eq!(parts2.len(), 2,
"regular space should split correctly");

}
13 changes: 13 additions & 0 deletions tests/ui/whitespace/invalid_whitespace.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
// This test ensures that the Rust lexer rejects invalid whitespace
// characters such as ZERO WIDTH SPACE.

//@ check-fail

fn main() {
let x = 5;
let y = 10;

let a=​x + y;
//~^ ERROR unknown start of token
//~| HELP invisible characters like
}
Comment thread
Brace1000 marked this conversation as resolved.
10 changes: 10 additions & 0 deletions tests/ui/whitespace/invalid_whitespace.stderr
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
error: unknown start of token: \u{200b}
--> $DIR/invalid_whitespace.rs:10:11
|
LL | let a=​x + y;
| ^
|
= help: invisible characters like '\u{200b}' are not usually visible in text editors

error: aborting due to 1 previous error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update this file so it matches the test output exactly.

The easiest way to do this is ./x test ui/whitespace --bless

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks! I ran "./x test ui/whitespace --bless" and updated the file to match the expected output. Everything is passing on my end now


58 changes: 58 additions & 0 deletions tests/ui/whitespace/vertical_tab_lexer.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
//@ run-pass
// ignore-tidy-tab
//
// Tests that the Rust lexer accepts Unicode Pattern_White_Space characters.
//
// Worth noting: the Rust reference defines whitespace as Pattern_White_Space,
// which is not the same as what is_ascii_whitespace or is_whitespace give you.
//
// is_ascii_whitespace follows WhatWG and skips vertical tab (\x0B).
// is_whitespace uses Unicode White_Space, which is a broader set.
//
// The 11 characters that actually count as whitespace in Rust source:
// \x09 \x0A \x0B \x0C \x0D \x20 \u{85} \u{200E} \u{200F} \u{2028} \u{2029}
//
// Ref: https://github.com/rustfoundation/interop-initiative/issues/53

#[rustfmt::skip]
fn main() {
// tab (\x09) between let and the name
let _ws1 = 1_i32;

// vertical tab (\x0B) between let and the name
// this is the one is_ascii_whitespace gets wrong
let _ws2 = 2_i32;

// form feed (\x0C) between let and the name
let _ws3 = 3_i32;

// plain space (\x20), here just so every character is represented
let _ws4 = 4_i32;

// NEL (\u{85}) between let and the name
let…_ws5 = 5_i32;

// left-to-right mark (\u{200E}) between let and the name
let‎_ws6 = 6_i32;

// right-to-left mark (\u{200F}) between let and the name
let‏_ws7 = 7_i32;

// \x0A, \x0D, \u{2028}, \u{2029} are also Pattern_White_Space but they
// act as line endings, so you can't stick them in the middle of a statement.
// The lexer still handles them correctly at line boundaries.

// These are Unicode White_Space but NOT Pattern_White_Space:
// \u{A0} no-break space \u{1680} ogham space mark
// \u{2000} en quad \u{2001} em quad
// \u{2002} en space \u{2003} em space
// \u{2004} three-per-em space \u{2005} four-per-em space
// \u{2006} six-per-em space \u{2007} figure space
// \u{2008} punctuation space \u{2009} thin space
// \u{200A} hair space \u{202F} narrow no-break space
// \u{205F} medium math space \u{3000} ideographic space

// add them up so the compiler doesn't complain about unused variables
let _sum = _ws1 + _ws2 + _ws3 + _ws4 + _ws5 + _ws6 + _ws7;
println!("{}", _sum);
}
Loading