Skip to content

Conversation

@nlevitt
Copy link
Contributor

@nlevitt nlevitt commented Mar 28, 2016

The old GeeZipFile class: https://github.com/internetarchive/warctools/blob/7f2b9a9/hanzo/warctools/stream.py#L183
depended on implementation details of gzip.py in the python standard library. It looks like python 3.5 breaks that class. Moreover, GeeZipFile doesn't work correctly for certain gzip members in any python version (can skip over some very small members, and there might be other issues). It turned out to be necessary to write a new class using zlib, rather than try to extend gzip.GzipFile.

This pull request makes tests pass on python 3.5, see
https://travis-ci.org/internetarchive/warctools/builds/157185857
vs
https://travis-ci.org/internetarchive/warctools/builds/157138379

These are changes I was working as part of the painful project of making CDX-Writer work with mainline surt and warctools libraries. That project isn't finished, but I think we might as well merge this change now.

nlevitt added 12 commits March 22, 2016 20:43
… filedesc header, and split line only on space character
…ing (fixes fundamental problem with old approach, that a read on a short gzip member would read into the following member)
…github.com/internetarchive/warctools/blob/cdx-writer/hanzo/warctools/arc.py
* origin/master:
  back to dev version number
  set version=4.10.0 for push to pypi
  allow failures for python 3.5 and nightly, since they fail now (to be investigated)
  bump version number so that current master is after latest release on pypi, and add some python versions to .travis.yml

Conflicts:
	.travis.yml
	setup.py
@nlevitt nlevitt changed the title don't merge yet custom class MultiMemberGzipReader and other tweaks Sep 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant