-
Notifications
You must be signed in to change notification settings - Fork 55
Allow to customize raw CSV file reading #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This allows subclasses to customize raw file reading.
|
I second the proposition. It would also help with #197 Instead of strictly requiring a file-like object, the importer could allow more general |
| This method uses the class members header, footer, names, dialect, | ||
| comments, and order. Overriding this method causes those members | ||
| to be ignored unless the overriding method explicitly uses them. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does not belong to this commit or this PR. I also find it unnecessary.
| """Open the CSV file for reading. | ||
| This method can be overridden in subclasses to customize raw file reading, | ||
| for example to skip lines before processing or to handle special file formats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "special file formats" mean here? This class supports a declarative way to define an importer for CSV files. Why would you want to use it for something else?
| Returns: | ||
| A file object opened for reading. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @johannesjh: this interface is not optimal. Having the open() method return an Iterable[str] would be better. A
def iterlines(self, fd):
yield from fdand use it where appropriate, would be another, maybe even better, possibility.
| c, d | ||
| """ | ||
|
|
||
| class CustomReader(CSVReader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class CustomReader(CSVReader): | |
| class Reader(CSVReader): |
Please keep the classes in the tests named uniformly.
| class CustomReader(CSVReader): | ||
| first = Column("First") | ||
| second = Column("Second") | ||
|
|
||
| def open(self, filepath): | ||
| """Skip lines until we find the column headers.""" | ||
| fd = super().open(filepath) | ||
| # Read lines until we find one containing "First" | ||
| for line in fd: | ||
| if "First" in line: | ||
| # Create a new file-like object with the header line and remaining content | ||
| remaining = fd.read() | ||
| fd.close() | ||
| return io.StringIO(line + remaining) | ||
| return fd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the iterlines() idea exposed above, this would become simply:
from itertools import dropwhile
class Reader(CSVReader):
first = Column("First")
second = Column("Second")
def iterlines(self, fd):
return dropwhile(lambda line: "First" not in line, fd)which seems much better to me.
While CSVReader has numerous options to skip header or footer lines, or skip lines based on prefixes, it is cumbersome to read lines with varying lines before or after the actual statements. Users would need to override CSVReader.read(). Simply extending using
super().read(...)is possible but requires creating a temporary file because read() needs a physical file path. Simply adding anopen(str) -> file-likemethod to CSVReader allows to keep the useful functionality ofread()while still being able to preprocess the file, for example, skipping a variable number of lines based on a pattern.This would fix #196.