-
Notifications
You must be signed in to change notification settings - Fork 94
[Version 11.0] Feature support for UTF-8 string literals #1610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: draft-v11
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -920,14 +920,16 @@ | |
|
|
||
| In a verbatim string literal, the characters between the delimiters are interpreted verbatim, with the only exception being a *Quote_Escape_Sequence*, which represents one double-quote character. In particular, simple escape sequences, and hexadecimal and Unicode escape sequences are not processed in verbatim string literals. A verbatim string literal may span multiple lines. | ||
|
|
||
| All string literal forms may optionally have a trailing *Utf8_Suffix*. The representation of each form is discussed below. | ||
|
|
||
| ```ANTLR | ||
| String_Literal | ||
| : Regular_String_Literal | ||
| | Verbatim_String_Literal | ||
| ; | ||
|
|
||
| fragment Regular_String_Literal | ||
| : '"' Regular_String_Literal_Character* '"' | ||
| : '"' Regular_String_Literal_Character* '"' Utf8_Suffix? | ||
| ; | ||
|
|
||
| fragment Regular_String_Literal_Character | ||
|
|
@@ -943,7 +945,7 @@ | |
| ; | ||
|
|
||
| fragment Verbatim_String_Literal | ||
| : '@"' Verbatim_String_Literal_Character* '"' | ||
| : '@"' Verbatim_String_Literal_Character* '"' Utf8_Suffix? | ||
| ; | ||
|
|
||
| fragment Verbatim_String_Literal_Character | ||
|
|
@@ -958,6 +960,10 @@ | |
| fragment Quote_Escape_Sequence | ||
| : '""' | ||
| ; | ||
|
|
||
| fragment Utf8_Suffix | ||
| : 'u8' | 'U8' | ||
| ; | ||
| ``` | ||
|
|
||
| > *Example*: The example | ||
|
|
@@ -990,7 +996,26 @@ | |
| <!-- markdownlint-enable MD028 --> | ||
| > *Note*: Since a hexadecimal escape sequence can have a variable number of hex digits, the string literal `"\x123"` contains a single character with hex value `123`. To create a string containing the character with hex value `12` followed by the character `3`, one could write `"\x00123"` or `"\x12"` + `"3"` instead. *end note* | ||
|
|
||
| The type of a *String_Literal* is `string`. | ||
| A *String_Literal* that does not contain a *Utf8_Suffix* is a ***UTF-16 string literal***, whose type is `string`. | ||
|
|
||
| A *String_Literal* that contains a *Utf8_Suffix* is a ***UTF-8 string literal***, whose type is `System.ReadOnlySpan<byte>` (an indexable collection type), and whose value contains a UTF-8 byte representation of the string. A null terminator (a byte with value zero) is placed beyond the last byte in memory (and outside the length of the `ReadOnlySpan<byte>`) in order to support scenarios that expect null-terminated byte strings. A UTF-8 string literal is not a constant. A UTF-8 string literal without its *Utf8_Suffix* shall be valid UTF-16. (For example, `"\uDC00\uDD00"u8` is ill-formed as one low surrogate cannot be followed by another.) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should add a new subclause within 16.5.15 Safe context constraint, saying that when a UTF-8 string literal has a using System;
public class C {
public ReadOnlySpan<byte> M() {
return "xyz"u8;
}
}Concatenation of UTF-8 string literals would then follow the safe-context rule in 16.5.15.5 Operators. As the safe-context of both operands would be caller-context, the safe-context of the result would be the same, without having to be specified separately for this case.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The following would then likewise be valid: public class C {
public ref readonly byte M() {
return ref "xyz"u8[0];
}
} |
||
|
|
||
| > *Note*: While every UTF-8 string literal is a `ReadOnlySpan<byte>`, not every `ReadOnlySpan<byte>` represents a UTF-8 string literal. See the description of UTF-8 string concatenation in [§12.13.5](expressions.md#12135-addition-operator). *end note* | ||
| <!-- markdownlint-disable MD028 --> | ||
|
|
||
| <!-- markdownlint-enable MD028 --> | ||
| > *Note*: As `ReadOnlySpan<byte>` is a ref struct type, a UTF-8 string literal cannot be converted to `object` or used as a type parameter ([§16.2.3]( structs.md#1623-ref-modifier)). *end note* | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But the wording is imprecise. A string literal is syntax rather than a type so it couldn't be used as a type parameter (or rather a type argument) anyway. |
||
| <!-- markdownlint-disable MD028 --> | ||
|
|
||
| <!-- markdownlint-enable MD028 --> | ||
| > *Example*: Here are examples of each form of string literal: | ||
| > | ||
| > | **Encoding** | **Type** | **Regular String Literal** | **Verbatim String Literal** | **Raw String Literal** | | ||
| > |--------------|----------------------|---------------------|--------------------|--------------------| | ||
| > | UTF-16 | `string` | `"Hello"` | `@"Hello"` | `"""Hello"""` | | ||
| > | UTF-8 | `ReadOnlySpan<byte>` | `"Hello"u8` | `@"Hello"u8` | `"""Hello"""u8` | | ||
| > | ||
| > *end example* | ||
|
|
||
| Each string literal does not necessarily result in a new string instance. When two or more string literals that are equivalent according to the string equality operator ([§12.15.8](expressions.md#12158-string-equality-operators)), appear in the same assembly, these string literals refer to the same string instance. | ||
|
|
||
|
|
@@ -1521,7 +1546,7 @@ | |
| : Decimal_Digit+ PP_Whitespace PP_Compilation_Unit_Name | ||
| | Decimal_Digit+ | ||
| | DEFAULT | ||
| | 'hidden' | ||
| | PP_Start_Line_Character PP_Whitespace? '-' PP_Whitespace? PP_End_Line_Character | ||
| PP_Whitespace (PP_Character_Offset PP_Whitespace)? PP_Compilation_Unit_Name | ||
| ; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this normative text, it is not clear whether redundant parentheses are allowed.