-
Notifications
You must be signed in to change notification settings - Fork 71
Open
Description
My bank emits OFX files as XML files without any line-breaks, e.g. (after anonymizing the contents):
Parsing fails with a traceback:
>>> python3 -c "import ofxtools; parser = ofxtools.Parser.OFXTree(); parser.parse('test.ofx.txt')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
import ofxtools; parser = ofxtools.Parser.OFXTree(); parser.parse('test.ofx.txt')
~~~~~~~~~~~~^^^^^^^^^^^^^^^^
File "C:\Users\...\site-packages\ofxtools\Parser.py", line 82, in parse
self.header, message = self._read(source)
~~~~~~~~~~^^^^^^^^
File "C:\Users\...\site-packages\ofxtools\Parser.py", line 115, in _read
header, message = parse_header(source)
~~~~~~~~~~~~^^^^^^^^
File "C:\Users\...\site-packages\ofxtools\header.py", line 268, in parse_header
line = source.readline().decode("ascii")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 920: ordinal not in range(128)
Note that the decoding is actualy unnecessary here, as this is in the loop for skipping empty lines preceding the OFX header, if any.
The bug could be solved by one of these:
- Clean: Simply do not decode. Modern Python3 versions support text operations like
.strip()also onbytes. However, I don't know if this would raise the minimum required Python version. - Safe: Decode to an 8-bit encoding like
"latin1"instead of"ascii". Unlike"ascii", 8-bit encodings can decode any byte stream. I'm not sure what would happen on a null character though.
Metadata
Metadata
Assignees
Labels
No labels