Skip to content

Commit 5709cdb

Browse files
committed
Lot of large changes breaking compatibility, though strictly required.
Python 3 support has been implemented, Python 2 support has been dropped. Migrated from dumb strings to `ast`, which has greatly improved the structure of the code and allowed further improvements. The cost is that the comments and whitespaces are not preserved, since there are no nodes for them. Transforming asts into stri ngs is done using `astor` library. Optionally (if installed) the result is postprocessed by `black`. Added types, reformatted source. Now we use tabs (a tab is now considered to be completion to 4 spaces). Enable `editorconfig` in your favorite editor if you want to see the sources right. Implemented python code detection. Instead of custom and incomplete parsing as was done earlier we now use python parser. The first syntax error in python code is considered to be an end of python code and continuation of CoCo/R grammar. Implemented optional typing for parameters. Typing uses Java/C++/C# syntax (type name), not python one (name: type) for compatibility Moved the components used by every parser into `CoCoRuntime` module. It provides abstract base classes with some props that must be populated in generated code Started eliminating global state. Stuff earlier saved into classes now is saved into instances. Dropped frame files support. Use object oriented programming instead. `Copyright.frame` is not used anymore (in fact it has been never used in CoCoPy, but CoCo/R manual mentions it). Instead put copyright header into `__copyright__` variable, and it will be inserted in both `Scanner` and `Parser` Dropped generation of drivers. They were not really generated, just copied. (Non)terminals enum values are now stored in a single shared `IntEnum` rather than in parser and lexer classes themselves. Added a mode to use lookups from the enum instead of raw numbers in generated code. Should improve readability and simplify debugging and diffing the generated code Now we use `setup.cfg` and `pyproject.toml` for fetching versions from git. Installation from the sources other than git is deprecated. Refactored keywords lookup. Now they are not matched using elif ladders, instead they are stored in a map Optimized error messages storage. Since their numbers are sequential, now they are stored in a `list` rather than a `dict`. Dropped numeric values (that initially were not in a enum, just untidy mess) mixing node types, symbol types and state types. Now we have a dedicated class for each. Optionally optimized keywords lookup using prefix trees using `datrie` library. Automatically enabled, if `datrie` is installed.
1 parent b213168 commit 5709cdb

99 files changed

Lines changed: 10499 additions & 17020 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.editorconfig

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
1-
root = true
2-
[*]
3-
charset = utf-8
4-
indent_style = tab
5-
indent_size = 4
6-
insert_final_newline = true
7-
newline = lf
1+
root = true
2+
3+
[*]
4+
charset = utf-8
5+
indent_style = tab
6+
indent_size = 4
7+
insert_final_newline = true
8+
end_of_line = lf
9+
10+
[*.{yml,yaml}]
11+
indent_style = space
12+
indent_size = 2

.gitattributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.pdf filter=lfs diff=lfs merge=lfs -text

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
__pycache__
22
*.pyc
33
*.pyo
4+
/monkeytype.sqlite3
45
/CocoPy.egg-info
56
/build
67
/dist
78
/.eggs
89
/testSuite/tmp
10+
/*.srctrlbm
11+
/*.srctrldb
12+
/.pytest_cache
13+
/.hypothesis

CoCoPy.srctrlprj

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
<?xml version="1.0" encoding="utf-8" ?>
2+
<config>
3+
<source_groups>
4+
<source_group_ffffffff-ffff-ffff-ffff-ffffffffffff>
5+
<name>Python Source Group</name>
6+
<source_extensions>
7+
<source_extension>.py</source_extension>
8+
</source_extensions>
9+
<source_paths>
10+
<source_path>./Coco</source_path>
11+
<source_path>./CoCoRuntime</source_path>
12+
</source_paths>
13+
<status>enabled</status>
14+
<type>Python Source Group</type>
15+
</source_group_ffffffff-ffff-ffff-ffff-ffffffffffff>
16+
</source_groups>
17+
<version>8</version>
18+
</config>

CoCoRuntime/errors.py

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
"""self.py -- Error handling routines"""
2+
3+
__copyright__ = """
4+
Compiler Generator Coco/R,
5+
Copyright (c) 1990, 2004 Hanspeter Moessenboeck, University of Linz
6+
extended by M. Loeberbauer & A. Woess, Univ. of Linz
7+
ported from Java to Python by Ronald Longo
8+
improved and refactored by KOLANICH
9+
10+
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.
11+
12+
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
13+
14+
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
15+
16+
As an exception, it is allowed to write an extension of Coco/R that is used as a plugin in non-free software.
17+
18+
If not otherwise stated, any source code generated by Coco/R (other than Coco/R itself) does not fall under the GNU General Public License.
19+
""" # pylint: disable=duplicate-code
20+
21+
import typing
22+
import sys
23+
from pathlib import Path
24+
from io import StringIO
25+
26+
from .scanner import Buffer
27+
28+
29+
class ErrorRec:
30+
__slots__ = ("line", "col", "num", "str")
31+
32+
def __init__(self, line: int, col: int, s: str) -> None:
33+
assert isinstance(line, int), line
34+
assert isinstance(col, int), col
35+
assert isinstance(s, str), repr(s)
36+
37+
self.line = line
38+
self.col = col
39+
self.num = 0
40+
self.str = s
41+
42+
43+
class Errors:
44+
errMsgFormat = "file %(file)s : (%(line)d, %(col)d) %(text)s\n"
45+
minErrDist = 2
46+
errDist = minErrDist
47+
# A function with prototype: f( errorNum=None ) where errorNum is a
48+
# predefined error number. f returns a tuple, ( line, column, message )
49+
# such that line and column refer to the location in the
50+
# source file most recently parsed. message is the error
51+
# message corresponging to errorNum.
52+
53+
def __init__(self) -> None:
54+
self.errors = []
55+
self.warnings = []
56+
self.mergedList = None # PrintWriter
57+
self.listName = ""
58+
59+
def storeError(self, line: int, col: int, s: str) -> None:
60+
self.errors.append(ErrorRec(line, col, s))
61+
62+
def storeWarning(self, line: int, col: int, s: str) -> None:
63+
self.warnings.append(ErrorRec(line, col, s))
64+
65+
@property
66+
def count(self) -> int:
67+
return len(self.errors)
68+
69+
def Warn(self, line: int, col: int, errMsg: str) -> None:
70+
self.storeWarning(line, col, errMsg)
71+
72+
@staticmethod
73+
def Exception(errMsg):
74+
print(errMsg)
75+
sys.exit(1)
76+
77+
@staticmethod
78+
def printMsg(fileName: str, line: int, column: int, msg: str) -> None:
79+
vals = {"file": fileName, "line": line, "col": column, "text": msg}
80+
sys.stdout.write(__class__.errMsgFormat % vals)
81+
82+
@staticmethod
83+
def display(s, e, resBuff):
84+
resBuff.write("**** ")
85+
for c in range(1, e.col):
86+
if s[c - 1] == "\t":
87+
resBuff.write("\t")
88+
else:
89+
resBuff.write(" ")
90+
resBuff.write("^ " + e.str + "\n")
91+
92+
def Summarize(self, sourceBuffer: Buffer) -> str:
93+
# Initialize the line iterator
94+
srcLineIter = iter(sourceBuffer)
95+
srcLineStr = next(srcLineIter, None)
96+
srcLineNum = 1
97+
98+
resBuff = StringIO()
99+
100+
# Initialize the error iterator
101+
errIter = iter(sorted(self.errors, key=lambda v: (v.line, v.col)))
102+
warnIter = iter(sorted(self.warnings, key=lambda v: (v.line, v.col)))
103+
104+
errRec = next(errIter, None)
105+
warnRec = next(warnIter, None)
106+
107+
while errRec and errRec.line < 0:
108+
print(errRec.str, file=resBuff)
109+
errRec = next(errIter, None)
110+
111+
while warnRec and warnRec.line < 0:
112+
print(warnRec.str, file=resBuff)
113+
warnRec = next(warnIter, None)
114+
115+
while errRec or warnRec:
116+
# Advance to the source line of the next error
117+
while srcLineStr is not None and (errRec is None or srcLineNum < errRec.line) and (warnRec is None or srcLineNum < warnRec.line):
118+
srcLineStr = next(srcLineIter, None)
119+
srcLineNum += 1
120+
121+
resBuff.write("%4d %s\n" % (srcLineNum, srcLineStr))
122+
# Write out all errors for the current source line
123+
while errRec and errRec.line == srcLineNum:
124+
self.display(srcLineStr, errRec, resBuff)
125+
errRec = next(errIter, None)
126+
127+
while warnRec and warnRec.line == srcLineNum:
128+
self.display(srcLineStr, warnRec, resBuff)
129+
warnRec = next(warnIter, None)
130+
131+
errRec = next(errIter, None)
132+
warnRec = next(warnIter, None)
133+
134+
resBuff.write("%d errors and %d warnings detected\n" % (self.count, len(self.warnings)))
135+
if self.count > 0:
136+
resBuff.write("see " + self.listName + "\n")
137+
return resBuff.getvalue()

CoCoRuntime/parser.py

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
"""Parser.py -- ATG parser runtime"""
2+
3+
__copyright__ = """
4+
Compiler Generator Coco/R,
5+
Copyright (c) 1990, 2004 Hanspeter Moessenboeck, University of Linz
6+
extended by M. Loeberbauer & A. Woess, Univ. of Linz
7+
ported from Java to Python by Ronald Longo
8+
improved and refactored by KOLANICH
9+
10+
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.
11+
12+
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
13+
14+
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
15+
16+
As an exception, it is allowed to write an extension of Coco/R that is used as a plugin in non-free software.
17+
18+
If not otherwise stated, any source code generated by Coco/R (other than Coco/R itself) does not fall under the GNU General Public License.
19+
""" # pylint: disable=duplicate-code
20+
21+
import typing
22+
from abc import ABC, abstractmethod
23+
24+
from .errors import Errors
25+
from .scanner import Scanner, Token
26+
27+
28+
class Parser(ABC):
29+
ENUM = None
30+
__main_production_name__ = None # type: str
31+
__EOF_sym__ = None # type: typing.Type["ScannerEnum"]
32+
33+
@abstractmethod
34+
def pragmas(self):
35+
raise NotImplementedError
36+
37+
set = None
38+
errorMessages = None
39+
minErrDist = 2
40+
41+
# -->declarations
42+
43+
def __init__(self) -> None:
44+
self.scanner = None
45+
self.token = None # last recognized token
46+
self.la = None # lookahead token
47+
self.genScanner = False
48+
self.tokenString = "" # used in declarations of literal tokens
49+
self.noString = "-none-" # used in declarations of literal tokens
50+
self.errDist = self.__class__.minErrDist
51+
self.errors = Errors()
52+
53+
def getParsingPos(self) -> typing.Tuple[int, int]:
54+
return self.la.line, self.la.col
55+
56+
def SynErr(self, errNum: int) -> None:
57+
if self.errDist >= self.__class__.minErrDist:
58+
line, col = self.getParsingPos()
59+
self.errors.storeError(line, col, self.__class__.errorMessages[errNum])
60+
61+
self.errDist = 0
62+
63+
def SemErr(self, msg: str) -> None:
64+
if self.errDist >= self.__class__.minErrDist:
65+
line, col = self.getParsingPos()
66+
self.errors.storeError(line, col, msg)
67+
68+
self.errDist = 0
69+
70+
def Warning(self, msg):
71+
if self.errDist >= self.__class__.minErrDist:
72+
self.errors.Warn(msg)
73+
74+
self.errDist = 0
75+
76+
@staticmethod
77+
def Successful():
78+
return self.errors.count == 0
79+
80+
def LexString(self):
81+
return self.token.val
82+
83+
def LookAheadString(self):
84+
return self.la.val
85+
86+
def Get(self) -> None:
87+
while True:
88+
self.token = self.la
89+
self.la = self.scanner.Scan()
90+
if self.la.kind <= self.__class__.ENUM.maxT:
91+
self.errDist += 1
92+
break
93+
self.pragmas()
94+
self.la = self.token
95+
96+
def Expect(self, n: "ScannerEnum") -> None:
97+
if self.la.kind == n:
98+
self.Get()
99+
else:
100+
self.SynErr(n)
101+
102+
def StartOf(self, s: int) -> bool:
103+
return self._StartOf(s, self.la.kind)
104+
105+
@classmethod
106+
def _StartOf(cls, s: int, kind: "ScannerEnum") -> bool:
107+
if kind >= 0:
108+
return cls.set[s][kind]
109+
return False
110+
111+
def ExpectWeak(self, n: "ScannerEnum", follow: int) -> None:
112+
if self.la.kind == n:
113+
self.Get()
114+
else:
115+
self.SynErr(n)
116+
while not self.StartOf(follow):
117+
self.Get()
118+
119+
def WeakSeparator(self, n: "ScannerEnum", syFol: int, repFol: int) -> bool:
120+
s = [False for i in range(self.__class__.ENUM.maxT + 1)]
121+
if self.la.kind == n:
122+
self.Get()
123+
return True
124+
125+
if self.StartOf(repFol):
126+
return False
127+
128+
for i in range(self.__class__.ENUM.maxT):
129+
s[i] = self._StartOf(syFol, i) or self._StartOf(repFol, i) or self._StartOf(0, i)
130+
self.SynErr(n)
131+
while not s[self.la.kind]:
132+
self.Get()
133+
return self.StartOf(syFol)
134+
135+
@property
136+
def __main_production__(self) -> typing.Callable[[], None]:
137+
return getattr(self, self.__class__.__main_production_name__)
138+
139+
def Parse(self, scanner: Scanner) -> None:
140+
self.scanner = scanner
141+
self.la = Token()
142+
self.la.val = ""
143+
self.Get()
144+
self.__main_production__()
145+
self.Expect(self.__class__.ENUM.EOF_SYM)

0 commit comments

Comments
 (0)