-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
In production at IA, probably caused by petabox downtime or network error, I got a the following exception and stack trace:
TypeError: sequence item 0: expected str instance, bytes found
File "extraction_ungrobided.py", line 272, in <module>
MRExtractUnGrobided.run()
File "mrjob/job.py", line 424, in run
mr_job.execute()
File "mrjob/job.py", line 433, in execute
self.run_mapper(self.options.step_num)
File "mrjob/job.py", line 517, in run_mapper
for out_key, out_value in mapper(key, value) or ():
File "extraction_ungrobided.py", line 228, in mapper
info, status = self.extract(info)
File "extraction_ungrobided.py", line 143, in extract
info['file:cdx']['c_size'])
File "extraction_ungrobided.py", line 126, in fetch_warc_content
gwb_record = rstore.load_resource(warc_uri, offset, c_size)
File "wayback/resourcestore.py", line 65, in load_resource
return create_resource(loader.load_block(bstart, blen))
File "wayback/resource.py", line 583, in create_resource
record, errors, offset = parser.parse(rs, 0, line)
File "hanzo/warctools/warc.py", line 223, in parse
% (",".join(self.KNOWN_VERSIONS)),
self.KNOWN_VERSIONS is defined as bytes at https://github.com/internetarchive/warctools/blob/master/hanzo/warctools/warc.py#L177, but is being joined with a string.
One fix, though i'm not sure it would work in Python 2.7, would be:
(",".join([s.decode('utf-8') for s in self.KNOWN_VERSIONS])
There's probably a more idiomatic way, but I can submit a patch for that.
While we're at it, might want to make it a join on ", ", not ","?
Metadata
Metadata
Assignees
Labels
No labels