As CSV files downloaded from banks are notoriously messy, one might need to pre-process them beyond overwriting the header and footer members of 'csvbase.CSVReader`. For example. there might be a variable number of header lines. The common solution is to read in the file as a string, do the preprocessing (e.g. skip all lines up to a keyword, e.g. the first column header), then provide the updated string to the usual processing.
But csvbase.CSVReader.read() expects a file name as argument, not a file-like object or a content string. Therefore, one must do something like this
class MyMessyBank:
def read(self, filepath):
with open(filepath, 'r') as f:
content = f.read()
content = my_preprocessing(content)
with tempfile.NamedTemporaryFile as t:
t.writelines(content)
for row in super().read(t.name):
yield row
it would be easier if one could just use StringIO() or similar to provide modified data to CSVReader.read().
There are multiple options that come to my mind
- Allow
read(str, ...) as well as read(file, ...)
- Introduce
read() usage like read(None, ... , content: <file>)
- Add a
open(filename) -> file function to importers.Importer that can be overwritten by subclasses
As CSV files downloaded from banks are notoriously messy, one might need to pre-process them beyond overwriting the
headerandfootermembers of 'csvbase.CSVReader`. For example. there might be a variable number of header lines. The common solution is to read in the file as a string, do the preprocessing (e.g. skip all lines up to a keyword, e.g. the first column header), then provide the updated string to the usual processing.But
csvbase.CSVReader.read()expects a file name as argument, not a file-like object or a content string. Therefore, one must do something like thisit would be easier if one could just use
StringIO()or similar to provide modified data toCSVReader.read().There are multiple options that come to my mind
read(str, ...)as well asread(file, ...)read()usage likeread(None, ... , content: <file>)open(filename) -> filefunction toimporters.Importerthat can be overwritten by subclasses