@@ -84,16 +84,39 @@ functions share a single identifier space, so a given name cannot
8484simultaneously denote both. A user-defined function name must not conflict
8585with the name of any built-in operator or function (see Section 4.1).
8686
87- Identifier character set (clarification): The first character of an
88- identifier MUST be an ASCII letter (A–Z or a–z) or underscore ('_').
89- Subsequent characters MAY be ASCII letters, underscore, or the binary
90- digits '0' and '1' only. Decimal digits other than '0' and '1' (that is,
91- '2'..'9') and the dot character '.' are NOT permitted inside identifiers.
92- Identifiers therefore cannot contain '.' (dot) and may not include any
93- non-binary digits; this preserves an unambiguous lexical distinction
94- between binary integer literals and identifiers, and prevents confusion
95- with module-qualified names (which are formed by a separate dotted name
96- syntax documented elsewhere in this specification).
87+ Identifier character set (clarification): The (non-empty) sequence of
88+ characters that forms an identifier is determined by the following rules,
89+ which match the reference lexer implementation:
90+
91+ - The first character MUST NOT be '0' or '1'. Any other non-ASCII
92+ character is disallowed, but otherwise the lexer permits a broad set of
93+ ASCII punctuation and symbol characters in addition to letters and
94+ digits. In particular, the following characters are valid as the first
95+ character of an identifier:
96+
97+ - Lowercase letters 'a'–'z'
98+ - Uppercase letters 'A'–'Z'
99+ - Decimal digits '2'–'9'
100+ - The punctuation and symbol characters
101+ `; / ! @ $ % & ~ _ + | : < > ?`
102+
103+ - Subsequent characters in an identifier may be any of the following:
104+
105+ - Lowercase letters 'a'–'z'
106+ - Uppercase letters 'A'–'Z'
107+ - Decimal digits '0'–'9'
108+ - The punctuation and symbol characters
109+ `; . / ! @ $ % & ~ _ + | : < > ?`
110+
111+ As noted above, non-ASCII characters remain disallowed, and the
112+ delimiter characters '{', '}', '[', ']', '(', ')', '=', ',', and '#'
113+ are never permitted inside identifiers.
114+
115+ This deliberately-permissive identifier character set preserves an
116+ unambiguous lexical distinction between binary integer literals (which
117+ must begin with '0' or '1') and identifiers, while allowing module-
118+ qualified names and other symbolic conventions to be expressed directly
119+ as plain identifiers in source code.
97120
98121
991223. Data Model
0 commit comments