Skip to content

Commit 4d9923b

Browse files
Alter lexing rules, update icon for clarity at small sizes
1 parent 1d26232 commit 4d9923b

File tree

7 files changed

+50
-13
lines changed

7 files changed

+50
-13
lines changed
-61.2 KB
Binary file not shown.

__pycache__/lexer.cpython-314.pyc

-11.1 KB
Binary file not shown.

__pycache__/parser.cpython-314.pyc

-20.1 KB
Binary file not shown.

asmln.exe

-952 Bytes
Binary file not shown.

icon.png

-549 Bytes
Loading

lexer.py

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,20 @@ class Token:
1919
column: int
2020

2121

22-
KEYWORDS = {"IF", "ELSIF", "ELSE", "WHILE", "FOR", "FUNC", "RETURN", "BREAK", "GOTO", "GOTOPOINT", "CONTINUE"}
22+
KEYWORDS = {
23+
"IF",
24+
"ELSIF",
25+
"ELSE",
26+
"WHILE",
27+
"FOR",
28+
"FUNC",
29+
"RETURN",
30+
"BREAK",
31+
"CONTINUE",
32+
"GOTO",
33+
"GOTOPOINT"
34+
}
35+
2336
SYMBOLS = {
2437
"(": "LPAREN",
2538
")": "RPAREN",
@@ -138,10 +151,11 @@ def _consume_identifier(self) -> Token:
138151
return Token(token_type, value, line, col)
139152

140153
def _is_identifier_start(self, ch: str) -> bool:
141-
return (ch == "_") or ("A" <= ch <= "Z") or ("a" <= ch <= "z")
154+
return (ch in "abcdefghijklmnopqrstuvwxyz23456789;/ABCDEFGHIFJKLMNOPQRSTUVWXYZ!@$%&~_+|:<>?")
142155

143156
def _is_identifier_part(self, ch: str) -> bool:
144-
return (ch == "_") or ("A" <= ch <= "Z") or ("a" <= ch <= "z") or (ch in "01")
157+
return (ch in "abcdefghijklmnopqrstuvwxyz1234567890;./ABCDEFGHIFJKLMNOPQRSTUVWXYZ!@$%&~_+|:<>?")
158+
# "." is not actually a valid character in namespace symbols, but is allowed since it is used to separate module names from namespace symbols.
145159

146160
def _consume_line_continuation(self) -> None:
147161
if self.index + 1 >= len(self.text):

spec.txt

Lines changed: 33 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -84,16 +84,39 @@ functions share a single identifier space, so a given name cannot
8484
simultaneously denote both. A user-defined function name must not conflict
8585
with the name of any built-in operator or function (see Section 4.1).
8686

87-
Identifier character set (clarification): The first character of an
88-
identifier MUST be an ASCII letter (A–Z or a–z) or underscore ('_').
89-
Subsequent characters MAY be ASCII letters, underscore, or the binary
90-
digits '0' and '1' only. Decimal digits other than '0' and '1' (that is,
91-
'2'..'9') and the dot character '.' are NOT permitted inside identifiers.
92-
Identifiers therefore cannot contain '.' (dot) and may not include any
93-
non-binary digits; this preserves an unambiguous lexical distinction
94-
between binary integer literals and identifiers, and prevents confusion
95-
with module-qualified names (which are formed by a separate dotted name
96-
syntax documented elsewhere in this specification).
87+
Identifier character set (clarification): The (non-empty) sequence of
88+
characters that forms an identifier is determined by the following rules,
89+
which match the reference lexer implementation:
90+
91+
- The first character MUST NOT be '0' or '1'. Any other non-ASCII
92+
character is disallowed, but otherwise the lexer permits a broad set of
93+
ASCII punctuation and symbol characters in addition to letters and
94+
digits. In particular, the following characters are valid as the first
95+
character of an identifier:
96+
97+
- Lowercase letters 'a'–'z'
98+
- Uppercase letters 'A'–'Z'
99+
- Decimal digits '2'–'9'
100+
- The punctuation and symbol characters
101+
`; / ! @ $ % & ~ _ + | : < > ?`
102+
103+
- Subsequent characters in an identifier may be any of the following:
104+
105+
- Lowercase letters 'a'–'z'
106+
- Uppercase letters 'A'–'Z'
107+
- Decimal digits '0'–'9'
108+
- The punctuation and symbol characters
109+
`; . / ! @ $ % & ~ _ + | : < > ?`
110+
111+
As noted above, non-ASCII characters remain disallowed, and the
112+
delimiter characters '{', '}', '[', ']', '(', ')', '=', ',', and '#'
113+
are never permitted inside identifiers.
114+
115+
This deliberately-permissive identifier character set preserves an
116+
unambiguous lexical distinction between binary integer literals (which
117+
must begin with '0' or '1') and identifiers, while allowing module-
118+
qualified names and other symbolic conventions to be expressed directly
119+
as plain identifiers in source code.
97120

98121

99122
3. Data Model

0 commit comments

Comments
 (0)