Summary
Implement content inspection within ZIP archives to detect compound document formats (DOCX, XLSX, HWP, etc.) that use ZIP as a container.
Context
5 tests in the compatibility corpus detect the outer ZIP format but miss the specific document type:
- `escapevel` - ZIP detected, expected "Zip data (MIME type ...)"
- `gpkg-1-zst` - TAR detected, expected "Gentoo GLEP 78 binary package"
- `HWP2016.hwpx.zip` - ZIP detected, expected "Hancom HWP file, HWPX"
- `issue311docx` - ZIP detected, expected "Microsoft Word 2007+"
- `issue359xlsx` - ZIP detected, expected "Microsoft Excel 2007+"
GNU file achieves this through magic rules that use indirect offsets and string matching within ZIP entries (checking for `[Content_Types].xml`, `word/`, `xl/` paths).
Acceptance Criteria
Depends On
References
Summary
Implement content inspection within ZIP archives to detect compound document formats (DOCX, XLSX, HWP, etc.) that use ZIP as a container.
Context
5 tests in the compatibility corpus detect the outer ZIP format but miss the specific document type:
GNU file achieves this through magic rules that use indirect offsets and string matching within ZIP entries (checking for `[Content_Types].xml`, `word/`, `xl/` paths).
Acceptance Criteria
Depends On
References