Add read-only support for zipped Dirfiles #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds read-only support for reading Dirfiles that are in uncompressed Zip files. Development of the patch was motivated by a need to reduce the total file count for FLAC-encoded Dirfiles, to alleviate the backup and data transfer overheads that result from having a very large number of small files. CLASS has been using these changes for more than a year at this point. The PR is identical to the patch attached to my 2020-02-28 post to the getdata-devel mailing list, except without the documentation (since it isn't part of this Git repository). The original version of the patch dates back to 2018.
Documentation
Separate from the Dirfile encoding scheme, GetData will read Dirfiles contained in uncompressed Zip files. This functionality is meant for reading archival data, so writing to these Zip files is not supported. Using the Info-ZIP
ziputility, a Zip file can be created by runningzip -r0 ../dirfile.zip *from within the root of an existing Dirfile. All encoding schemes are supported bythis functionality except for the two encoding schemes that already use Zip files, zzip and zzslim. The encoding scheme must be specified using the /ENCODING directive, even if the Dirfile is unencoded. For /INCLUDE directives and LINTERP field look up table files, only relative paths are supported and only without
./and../syntax.Although Zip files are most commonly created using Deflate compression, the Zip standard (ISO/IEC 21320-1) also supports Store compression, i.e., no compression at all. GetData's Zip file support requires Store compression for all data files, although either Store compression or Deflate compression can be used for any format files or any LINTERP field look up table files. With Store compression, a Zip file effectively concatenates a Dirfile's individual files together into a single file. Since a Zip file contains an offset table, unlike a tarball, random reads are supported without the need to load the entire file from disk.
Documentation patch