-
Notifications
You must be signed in to change notification settings - Fork 41
Support for iWork 2014+ #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pascalbros
wants to merge
3
commits into
obriensp:master
Choose a base branch
from
pascalbros:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| Sean Patrick O'Brien (http://obriensp.com) | ||
|
|
||
| Pasquale Ambrosini (http://pasqualeambrosini.net) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| #iWork '13 File Format | ||
|
|
||
| ## <a name="overview"/>Overview | ||
| The iWork '13 format is a [bundle](https://developer.apple.com/library/mac/documentation/corefoundation/conceptual/cfbundles/DocumentPackages/DocumentPackages.html)-based format built on top of several open source projects. This document describes the physical layout of files contained in these bundles and the algorithms involved, but does not attempt to describe the nature of the represented object graph. | ||
|
|
||
| ## <a name="bundle"/>Bundle | ||
|
|
||
| The organization of an iWork document bundle is fairly straightforward. Media such as images and videos are stored in the `Data` subdirectory, serialized objects are stored in [Index.zip](#index-zip), some light metadata is stored in the `Metadata` subdirectory, and a few preview images are stored in the top level of the bundle. | ||
|
|
||
| Photo Essay.key/ | ||
| Data/ | ||
| 143917994_2881x1992-small.jpg | ||
| 143918632_1620x1622-small.jpg | ||
| 154121867_2447x1632-small.jpg | ||
| 154146989_2880x1920-small.jpg | ||
| ... | ||
| Index.zip | ||
| Metadata/ | ||
| BuildVersionHistory.plist | ||
| DocumentIdentifier | ||
| Properties.plist | ||
| preview-micro.jpg | ||
| preview-web.jpg | ||
| preview.jpg | ||
|
|
||
| ## <a name="index-zip" />Index.zip | ||
| A document's objects are organized into groups called Components. Each Component is serialized into the [IWA](#iwa) format and stored in Index.zip. | ||
|
|
||
| Index/ | ||
| AnnotationAuthorStorage.iwa | ||
| CalculationEngine.iwa | ||
| Document.iwa | ||
| DocumentStylesheet.iwa | ||
| MasterSlide-1.iwa | ||
| MasterSlide-10.iwa | ||
| MasterSlide-11.iwa | ||
| ... | ||
|
|
||
| Curiously, the zip implementation iWork uses for this file is extremely limited. It does not support any form of compression or extensions like Zip64. Simply expanding Index.zip and then recreating it with a standard zip utility will result in a document that iWork refuses to open. | ||
|
|
||
| The iWork '13 applications contain a separate, more complete zip implementation used for reading and writing iWork '09 documents (which are bundles that have been zipped in their entirety), so I believe the choice to forgo compression for Index.zip is intentional. | ||
|
|
||
| One possibility is that Index.zip is used to prevent the syncronization issues that would occur if reading and writing a document involved accessing many small files. Saving a document might involve writing out several Components, so instead of coordinating writes to the various individual .iwa files, only the Index.zip must be locked. Since the .iwa files are inherently compressed (see [Snappy Compression](#snappy-compression)), the zip implementation used for Index.zip could be designed to be minimial and efficient. | ||
|
|
||
| ## <a name="iwa"/>IWA | ||
|
|
||
| Components are serialized into .iwa (iWork Archive) files, a custom format consisting of a [Protobuf](#protobuf) stream wrapped in a [Snappy](#snappy-compression) stream. | ||
|
|
||
| ### <a name="snappy-compression"/>Snappy Compression | ||
| [Snappy](https://code.google.com/p/snappy/) is a compression format created by Google aimed at providing decent compression ratios at high speeds. IWA files are stored in Snappy's [framing format](https://code.google.com/p/snappy/source/browse/trunk/framing_format.txt), though they do not adhere rigorously to the spec. In particular, they do not include the required Stream Identifier chunk, and compressed chunks do not include a CRC-32C checksum. | ||
|
|
||
| The stream is composed of contiguous chunks prefixed by a 4 byte header. The first byte indicates the chunk type, which in practice is always 0 for iWork, indicating a Snappy compressed chunk. The next three bytes are interpreted as a 24-bit little-endian integer indicating the length of the chunk. The 4 byte header is not included in the chunk length. | ||
|
|
||
| ### <a name="protobuf"/>Protobuf | ||
| The uncompresed IWA contains the Component's objects, serialized consecutively in a [Protobuf](https://code.google.com/p/protobuf/) stream. Each object begins with a [varint](https://developers.google.com/protocol-buffers/docs/encoding#varints) representing the length of the [ArchiveInfo](#archiveinfo) message, followed by the `ArchiveInfo` message itself. The `ArchiveInfo` includes a variable number of [MessageInfo](#messageinfo) messages describing the encoded [Payloads](#payload) that follow, though in practice iWork files seem to only have one payload message per `ArchiveInfo`. | ||
|
|
||
| Object 0 varint archiveInfoLength | ||
| ArchiveInfo archiveInfo | ||
| (payload) | ||
|
|
||
| Object 1 varint archiveInfoLength | ||
| ArchiveInfo archiveInfo | ||
| (payload) | ||
|
|
||
| ... | ||
|
|
||
| Object n varint archiveInfoLength | ||
| ArchiveInfo archiveInfo | ||
| (payload) | ||
|
|
||
| ### <a name="archiveinfo" />ArchiveInfo | ||
|
|
||
| The `ArchiveInfo` message contains the object's `identifier` (unique across the document), as well as information about the encoded messages (see [MessageInfo](#messageinfo)). | ||
|
|
||
| message ArchiveInfo { | ||
| optional uint64 identifier = 1; | ||
| repeated MessageInfo message_infos = 2; | ||
| } | ||
|
|
||
| ### <a name="messageinfo" />MessageInfo | ||
|
|
||
| The `MessageInfo` message describes the encoded payload that follows the `ArchiveInfo`. The `type` field indicates how the payload should be decoded (see [TSPRegistry](#tspregistry)), the `version` field indicates what format version was used to encode (currently 1.0.5), and the `length` field specifies how much data follows. The `field_infos` field would allow for deep introspection into the format of the payload, but it is absent from all archives I have inspected. It's possible that it is meant for backwards compatibility when new fields are introduced. The `object_references` and `data_references` fields are for bookkeeping/cross-referencing. | ||
|
|
||
| message MessageInfo { | ||
| required uint32 type = 1; | ||
| repeated uint32 version = 2 [packed = true]; | ||
| required uint32 length = 3; | ||
| repeated FieldInfo field_infos = 4; | ||
| repeated uint64 object_references = 5 [packed = true]; | ||
| repeated uint64 data_references = 6 [packed = true]; | ||
| } | ||
|
|
||
| ### <a name="payload"/>Payload | ||
| The format of the payload is determined by the `type` field of the associated `MessageInfo` message. The iWork applications manually map these integer values to their respective Protobuf message types, and the mappings vary slightly between Keynote, Pages and Numbers. This information can be recovered by inspecting the [TSPRegistry](#tspregistry) class at runtime. | ||
|
|
||
| Because Protobuf is not a self-describing format, applications attempting to understand the payloads must know a great deal about the data types and hierarchy of the objects serialized by iWork. Fortunately, all of this information can be recovered from the iWork binaries using [proto-dump](https://github.com/obriensp/proto-dump). | ||
|
|
||
| A full dump of the Protobuf messages can be found [here](../iWorkFileInspector/iWorkFileInspector/Messages/Proto/). | ||
|
|
||
| ### <a name="tspregistry" />TSPRegistry | ||
| The mapping between an object's `MessageInfo.type` and its respective Protobuf message type must by extracted from the iWork applications at runtime. Attaching to Keynote via a debugger and inspecting `[TSPRegistry sharedRegistry]` shows: | ||
|
|
||
| <TSPRegistry 0x102daf560 | ||
| _messageTypeToPrototypeMap = { | ||
| 148 -> 0x102f24680 KN.ChartInfoGeometryCommandArchive | ||
| 147 -> 0x102f24650 KN.SlideCollectionCommandSelectionBehaviorArchive | ||
| 146 -> 0x102f24560 KN.CommandSlideReapplyMasterArchive | ||
| 145 -> 0x102f24420 KN.CommandMasterSetBodyStylesArchive | ||
| ... | ||
|
|
||
| A full list of the type mappings can be found [here](../iWorkFileInspector/iWorkFileInspector/Persistence/MessageTypes/). | ||
|
|
||
| ## <a name="encryption"/>Encryption | ||
| If the document is locked with a password, nearly all files in the bundle are encrypted using [AES128](http://en.wikipedia.org/wiki/Advanced_Encryption_Standard) encryption with [PKCS7](http://en.wikipedia.org/wiki/Padding_\(cryptography\)#PKCS7) padding. For a full description of the encryption format, see [iWork Encrypted Stream](iWork Encrypted Stream.md). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| #iWork '15+ File Format | ||
|
|
||
| ## <a name="overview"/>Overview | ||
| The iWork '15+ format is a [bundle](https://developer.apple.com/library/mac/documentation/corefoundation/conceptual/cfbundles/DocumentPackages/DocumentPackages.html)-based format built on top of several open source projects. This document describes the physical layout of files contained in these bundles and the algorithms involved, but does not attempt to describe the nature of the represented object graph. | ||
|
|
||
| ## <a name="work"/>What does work | ||
|
|
||
| It works in the same way of the old project, but now you can choose also to open the new file format (2015+) | ||
|
|
||
| ## <a name="no-work"/>What doesn't | ||
|
|
||
| Documents protected with password, we need to investigate about this because something is changed in the new project format (there is no .iwph file) | ||
|
|
||
| ## <a name="bundle"/>Bundle | ||
|
|
||
| The organization of an iWork document bundle is fairly straightforward. Media such as images and videos are stored in the `Data` subdirectory, serialized objects are stored in [Index](#index), some light metadata is stored in the `Metadata` subdirectory, and a few preview images are stored in the top level of the bundle. | ||
|
|
||
| So basically the new format is the same of the 2013 but the `Photo Essay.key` is a zip file and Index is a folder and not a zip file | ||
|
|
||
|
|
||
| Photo Essay.key/ | ||
| Data/ | ||
| 143917994_2881x1992-small.jpg | ||
| 143918632_1620x1622-small.jpg | ||
| 154121867_2447x1632-small.jpg | ||
| 154146989_2880x1920-small.jpg | ||
| ... | ||
| Index/ | ||
| AnnotationAuthorStorage.iwa | ||
| CalculationEngine.iwa | ||
| Document.iwa | ||
| DocumentStylesheet.iwa | ||
| MasterSlide-1.iwa | ||
| MasterSlide-10.iwa | ||
| MasterSlide-11.iwa | ||
| ... | ||
| Metadata/ | ||
| BuildVersionHistory.plist | ||
| DocumentIdentifier | ||
| Properties.plist | ||
| preview-micro.jpg | ||
| preview-web.jpg | ||
| preview.jpg | ||
|
|
||
|
|
||
| Curiously, the zip implementation iWork uses for this file is extremely limited. It does not support any form of compression or extensions like Zip64. Simply expanding Index.zip and then recreating it with a standard zip utility will result in a document that iWork refuses to open. | ||
|
|
||
| The iWork '13 applications contain a separate, more complete zip implementation used for reading and writing iWork '09 documents (which are bundles that have been zipped in their entirety), so I believe the choice to forgo compression for Index.zip is intentional. | ||
|
|
||
| One possibility is that Index.zip is used to prevent the syncronization issues that would occur if reading and writing a document involved accessing many small files. Saving a document might involve writing out several Components, so instead of coordinating writes to the various individual .iwa files, only the Index.zip must be locked. Since the .iwa files are inherently compressed (see [Snappy Compression](#snappy-compression)), the zip implementation used for Index.zip could be designed to be minimial and efficient. | ||
|
|
||
| ## <a name="iwa"/>IWA | ||
|
|
||
| Components are serialized into .iwa (iWork Archive) files, a custom format consisting of a [Protobuf](#protobuf) stream wrapped in a [Snappy](#snappy-compression) stream. | ||
|
|
||
| ### <a name="snappy-compression"/>Snappy Compression | ||
| [Snappy](https://code.google.com/p/snappy/) is a compression format created by Google aimed at providing decent compression ratios at high speeds. IWA files are stored in Snappy's [framing format](https://code.google.com/p/snappy/source/browse/trunk/framing_format.txt), though they do not adhere rigorously to the spec. In particular, they do not include the required Stream Identifier chunk, and compressed chunks do not include a CRC-32C checksum. | ||
|
|
||
| The stream is composed of contiguous chunks prefixed by a 4 byte header. The first byte indicates the chunk type, which in practice is always 0 for iWork, indicating a Snappy compressed chunk. The next three bytes are interpreted as a 24-bit little-endian integer indicating the length of the chunk. The 4 byte header is not included in the chunk length. | ||
|
|
||
| ### <a name="protobuf"/>Protobuf | ||
| The uncompresed IWA contains the Component's objects, serialized consecutively in a [Protobuf](https://code.google.com/p/protobuf/) stream. Each object begins with a [varint](https://developers.google.com/protocol-buffers/docs/encoding#varints) representing the length of the [ArchiveInfo](#archiveinfo) message, followed by the `ArchiveInfo` message itself. The `ArchiveInfo` includes a variable number of [MessageInfo](#messageinfo) messages describing the encoded [Payloads](#payload) that follow, though in practice iWork files seem to only have one payload message per `ArchiveInfo`. | ||
|
|
||
| Object 0 varint archiveInfoLength | ||
| ArchiveInfo archiveInfo | ||
| (payload) | ||
|
|
||
| Object 1 varint archiveInfoLength | ||
| ArchiveInfo archiveInfo | ||
| (payload) | ||
|
|
||
| ... | ||
|
|
||
| Object n varint archiveInfoLength | ||
| ArchiveInfo archiveInfo | ||
| (payload) | ||
|
|
||
| ### <a name="archiveinfo" />ArchiveInfo | ||
|
|
||
| The `ArchiveInfo` message contains the object's `identifier` (unique across the document), as well as information about the encoded messages (see [MessageInfo](#messageinfo)). | ||
|
|
||
| message ArchiveInfo { | ||
| optional uint64 identifier = 1; | ||
| repeated MessageInfo message_infos = 2; | ||
| } | ||
|
|
||
| ### <a name="messageinfo" />MessageInfo | ||
|
|
||
| The `MessageInfo` message describes the encoded payload that follows the `ArchiveInfo`. The `type` field indicates how the payload should be decoded (see [TSPRegistry](#tspregistry)), the `version` field indicates what format version was used to encode (currently 1.0.5), and the `length` field specifies how much data follows. The `field_infos` field would allow for deep introspection into the format of the payload, but it is absent from all archives I have inspected. It's possible that it is meant for backwards compatibility when new fields are introduced. The `object_references` and `data_references` fields are for bookkeeping/cross-referencing. | ||
|
|
||
| message MessageInfo { | ||
| required uint32 type = 1; | ||
| repeated uint32 version = 2 [packed = true]; | ||
| required uint32 length = 3; | ||
| repeated FieldInfo field_infos = 4; | ||
| repeated uint64 object_references = 5 [packed = true]; | ||
| repeated uint64 data_references = 6 [packed = true]; | ||
| } | ||
|
|
||
| ### <a name="payload"/>Payload | ||
| The format of the payload is determined by the `type` field of the associated `MessageInfo` message. The iWork applications manually map these integer values to their respective Protobuf message types, and the mappings vary slightly between Keynote, Pages and Numbers. This information can be recovered by inspecting the [TSPRegistry](#tspregistry) class at runtime. | ||
|
|
||
| Because Protobuf is not a self-describing format, applications attempting to understand the payloads must know a great deal about the data types and hierarchy of the objects serialized by iWork. Fortunately, all of this information can be recovered from the iWork binaries using [proto-dump](https://github.com/obriensp/proto-dump). | ||
|
|
||
| A full dump of the Protobuf messages can be found [here](../iWorkFileInspector/iWorkFileInspector/Messages/Proto/). | ||
|
|
||
| ### <a name="tspregistry" />TSPRegistry | ||
| The mapping between an object's `MessageInfo.type` and its respective Protobuf message type must by extracted from the iWork applications at runtime. Attaching to Keynote via a debugger and inspecting `[TSPRegistry sharedRegistry]` shows: | ||
|
|
||
| <TSPRegistry 0x102daf560 | ||
| _messageTypeToPrototypeMap = { | ||
| 148 -> 0x102f24680 KN.ChartInfoGeometryCommandArchive | ||
| 147 -> 0x102f24650 KN.SlideCollectionCommandSelectionBehaviorArchive | ||
| 146 -> 0x102f24560 KN.CommandSlideReapplyMasterArchive | ||
| 145 -> 0x102f24420 KN.CommandMasterSetBodyStylesArchive | ||
| ... | ||
|
|
||
| A full list of the type mappings can be found [here](../iWorkFileInspector/iWorkFileInspector/Persistence/MessageTypes/). | ||
|
|
||
| ## <a name="encryption"/>Encryption | ||
| If the document is locked with a password, nearly all files in the bundle are encrypted using [AES128](http://en.wikipedia.org/wiki/Advanced_Encryption_Standard) encryption with [PKCS7](http://en.wikipedia.org/wiki/Padding_\(cryptography\)#PKCS7) padding. For a full description of the encryption format, see [iWork Encrypted Stream](iWork Encrypted Stream.md). | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a password protected presentation using Keynote 7.0.5 (iWork 2016 update) and both
.iwph+iwpv2files are present in the.keyfile.