From 04dc98d19b8f3f1485b0adcfab503398477b3c7f Mon Sep 17 00:00:00 2001 From: Gary Greene Date: Sat, 24 Aug 2024 12:28:37 -0400 Subject: [PATCH 1/4] Start documenting the file formats for lpkg Signed-off-by: Gary Greene --- docs/README.md | 113 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 113 insertions(+) create mode 100644 docs/README.md diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..6efc9c3 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,113 @@ +# LPkgTools + +## Introduction + +This document covers the overall technical design for the LPkgTools and file formats used by LPkg. This includes the +internal structures of the produced files and the logic flow of how the files are created and installed and or +uninstalled from a system either using the tools delivered as part of LPkgTools or related tools as part of the InstallD +or SoftwareMetadataD system daemons and user client tooling. + +## Rationale + +Foundationally, the reason for creating a brand new package format and manager for Linux stems from the need for a mix +of the features from both DPKG and RPM. Where DPKG has arguably better handling of different types of dependencies +(critical, recommends, requires, and suggests) and RPM is, again arguably simpler to create packages with. In addition, +LPkg aims to add proper support for other binary package types, such as meta packages and bundles. Neither of which DPKG +or RPM really natively support. + +## Package Format + +The package format of LPkg is fairly straight-forward. The basic structure has a header, payload, and footer. The header +of the source and binary package file format is organized like so: + +*Figure 1: Header Format for Source and Binary Files:* + +| Position | Section | Description | +| --- | --- | --- | +| 0 | Magic | The "magic number" section of the header which informs tools what the file type is and the version of the package format with a null terminator ending the section | +| 1 | ToC | The table of contents of the file. This is stored as a base64 encoded JSON string with a null terminator at the end of the section | +| 2 | Metadata | A base64 encoded JSON representation of the package's metadata with a null terminator at the end of the section | +| 3 | Checksum | A base64 encoded JSON representation of the SHA512 checksum of the payload content with a null terminator at the end of the section | +| 4 | Payload | The base64 encoded payload of the package. This is in GNU tar format compressed using the bzip2 compression algorithm. Again like other sections, it is null terminated | +| 5 | Signature | The GPG signatures used to certify that the package was built by the organisations or people that are trusted to have supplied or built the package | + +*Figure 2: Header Format for Meta Packages* + +| Position | Section | Description | +| --- | --- | --- | +| 0 | Magic | The "magic number" section of the header. Stored in hexidecimal with a null terminator | +| 1 | ToC | The table of contents of the file. Far more abridged as there is only the two "content" sections of the file compared to the other formats. Base64 encoded and null terminated | +| 2 | Metadata | A base64 encoded JSON containing all the metadata required to detail the other packages being required by the meta package | +| 3 | Signature | The GPG signatures used to certify that the package was built by the organisations or people that are trusted to have supplied or built the package | + +*Figure 3: Header Format for Bundle Packages* + +| Position | Section | Description | +| --- | --- | --- | +| 0 | Magic | The "magic number" section of the header. Stored in hexidecimal with a null terminator | +| 1 | ToC | The table of contents of the file. As before, JSON data with base64 encoding and null terminated | +| 2 | Metadata | A base64 encoded JSON containing the bundle metadata. This does not contain the metadata from the bundled packages in the bundle package | +| 3 | Checksum | A base64 encoded JSON with the SHA512 checksums of the GNU tar archive payload. Terminated by a null terminator | +| 4 | Payload | This section contains more than one base64 encoded bundled package bundled in a GNU tar archive. The tar archive is null terminated to denote the end of the payload section | +| 5 | Signature | The GPG signatures used to certify that the package was built by the organisations or people that are trusted to have supplied or built the package | + +*Figure 4: Header Format for Source Packages* + +| Position | Section | Description | +| --- | --- | --- | +| 0 | Magic | The "magic number" section of the header. Terminated with a null terminator | +| 1 | ToC | The table of contents of the file. Again, JSON data in base64 encoding and null terminated | +| 2 | Metadata | A base64 encoded JSON representing the metadata about the source package. This does not contain the metadata for any to-be-built sub packages that the source package can create. Null terminated | +| 3 | Checksum | A base64 encoded JSON representation of the checksum of the file. The hash used is SHA512. Terminated with a null terminator | +| 4 | Payload | This section contains the blueprint file, any patches needed for properly building the package, and any sources needed in a GNU tar bzip2 archive. The section is null terminated to denote the end of the payload section | +| 5 | Signature | The GPG signatures used to certify that the package was created by the organisations or people that are trusted to have supplied or built the package | + +### Magic Format + +The Magic section allows tools to determine the type and architecture of the binary file. + +### Table of Contents JSON Format + +The JSON structure of the table-of-contents is organized like so: + +*Figure 5: Table of Contents Keys and Structures:* + +| Key | Type | Description | +| --- | --- | --- | +| `$schema` | string | JSON schema definition | +| `type` | number | An integer constant to denote the type of package the ToC represents. The enum list for this is defined below | +| `arch` | number | An integer constant to denote the platform architecture the package can be installed on | +| `version` | string | The version of the ToC format. Current format is 1.0.0. This uses semver versioning | +| `sections` | object | The object that holds the ToC data | + +*Figure 6: Type Enum* + +| Number | Represented Value | +| --- | --- | +| 1 | Meta Package | +| 2 | Single-payload binary package | +| 3 | Multi-payload bundle package | +| 4 | Source package | + +*Figure 7: Architecture Enum* + +| Number | Represented Value | +| --- | --- | +| 1 | noarch | +| 2 | ia32 | +| 3 | x86_64 | +| 4 | aarch64 | + +*Figure 8: The `sections` Object* + +| Key | Type | Description | +| --- | --- | --- | +| `metadata` | object | The JSON object describing the start and end location of the Metadata section of the file | +| `checksum` | object | The JSON object describing the start and end location of the Checksum section of the file, if it contains one | +| `payload` | object | The JSON object describing the start and end location of the Payload section of the file, if it contains one | +| `signatures` | object | The JSON object describing the start and end location of the Signature section of the file | + +*Figure 9: The `metadata` Object* + +| Key | Type | Description | +| --- | --- | --- | From 86ac22e559c451dce6eb670d11ec7d19fa28779c Mon Sep 17 00:00:00 2001 From: Gary Greene Date: Sat, 24 Aug 2024 14:41:36 -0400 Subject: [PATCH 2/4] More work on defining the ToC section Signed-off-by: Gary Greene --- docs/README.md | 46 +++++++++++++++++++++++++++++++++------------- 1 file changed, 33 insertions(+), 13 deletions(-) diff --git a/docs/README.md b/docs/README.md index 6efc9c3..5a58a65 100644 --- a/docs/README.md +++ b/docs/README.md @@ -64,14 +64,18 @@ of the source and binary package file format is organized like so: ### Magic Format -The Magic section allows tools to determine the type and architecture of the binary file. +The Magic section allows tools to determine the type and architecture of the binary file. The current magic format in use by LPkg is shown in the table below. The structure of the hexidecimal encoded string is `lpkg:v$VERSION_INT:t$TYPE_ENUM:a$ARCH_ENUM` with colon seperated sections. The leading "lpkg" literal is to clearly denote that this is an archive format for use by LPkgTools and other compatible tools. The `v$VERSION_INT` denotes the version of the file type. The `t$TYPE_ENUM` denotes the type of package, whether binary, meta, bundle, or source. Finally, the `a$ARCH_ENUM` denotes the architecture that the package will install onto. + +*Figure 5: Magic Format per Package Type Examples* +| Hex String | ASCII String | Description | +| --- | --- | --- | +| 0x6c706b673a76313a74613a6131 | lpkg:v1:t2:a1 | An LPkg v1 single-payload binary package for the noarch (installable on all architectures) systems | ### Table of Contents JSON Format The JSON structure of the table-of-contents is organized like so: -*Figure 5: Table of Contents Keys and Structures:* - +*Figure 6: Table of Contents Keys and Structures:* | Key | Type | Description | | --- | --- | --- | | `$schema` | string | JSON schema definition | @@ -80,8 +84,7 @@ The JSON structure of the table-of-contents is organized like so: | `version` | string | The version of the ToC format. Current format is 1.0.0. This uses semver versioning | | `sections` | object | The object that holds the ToC data | -*Figure 6: Type Enum* - +*Figure 7: Type Enum* | Number | Represented Value | | --- | --- | | 1 | Meta Package | @@ -89,8 +92,7 @@ The JSON structure of the table-of-contents is organized like so: | 3 | Multi-payload bundle package | | 4 | Source package | -*Figure 7: Architecture Enum* - +*Figure 8: Architecture Enum* | Number | Represented Value | | --- | --- | | 1 | noarch | @@ -98,16 +100,34 @@ The JSON structure of the table-of-contents is organized like so: | 3 | x86_64 | | 4 | aarch64 | -*Figure 8: The `sections` Object* +*Figure 9: The `sections` Object* +| Key | Type | Mandatory | Description | +| --- | --- | --- | --- | +| `metadata` | object | true | The JSON object describing the start and end location of the Metadata section of the file | +| `checksum` | object | false | The JSON object describing the start and end location of the Checksum section of the file, if it contains one | +| `payload` | object | false | The JSON object describing the start and end location of the Payload section of the file, if it contains one | +| `signatures` | object | true | The JSON object describing the start and end location of the Signature section of the file | +*Figure 10: The `metadata` Object* | Key | Type | Description | | --- | --- | --- | -| `metadata` | object | The JSON object describing the start and end location of the Metadata section of the file | -| `checksum` | object | The JSON object describing the start and end location of the Checksum section of the file, if it contains one | -| `payload` | object | The JSON object describing the start and end location of the Payload section of the file, if it contains one | -| `signatures` | object | The JSON object describing the start and end location of the Signature section of the file | +| `start` | integer | Start of the metadata section | +| `end` | integer | End of the metadata section. This is the byte count before the null terminator | -*Figure 9: The `metadata` Object* +*Figure 11: The `checksum` Object* +| Key | Type | Description | +| --- | --- | --- | +| `start` | integer | Start of the checksum section | +| `end` | integer | End of the checksum section. This is the byte count before the null terminator | + +*Figure 12: The `payload` Object* +| Key | Type | Description | +| --- | --- | --- | +| `start` | integer | Start of the payload section | +| `end` | integer | End of the payload section. This is the byte count before the null terminator | +*Figure 13: The `signatures` Object* | Key | Type | Description | | --- | --- | --- | +| `start` | integer | Start of the signatures section | +| `end` | integer | End of the signatures section. This is the byte count before the null terminator | From 876a20251c20e699ea91f8663bb6bde0bb4367ec Mon Sep 17 00:00:00 2001 From: Gary Greene Date: Sat, 24 Aug 2024 14:54:58 -0400 Subject: [PATCH 3/4] Frame out rest of ToC section Signed-off-by: Gary Greene --- docs/README.md | 58 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 43 insertions(+), 15 deletions(-) diff --git a/docs/README.md b/docs/README.md index 5a58a65..86cd46f 100644 --- a/docs/README.md +++ b/docs/README.md @@ -21,7 +21,6 @@ The package format of LPkg is fairly straight-forward. The basic structure has a of the source and binary package file format is organized like so: *Figure 1: Header Format for Source and Binary Files:* - | Position | Section | Description | | --- | --- | --- | | 0 | Magic | The "magic number" section of the header which informs tools what the file type is and the version of the package format with a null terminator ending the section | @@ -31,8 +30,7 @@ of the source and binary package file format is organized like so: | 4 | Payload | The base64 encoded payload of the package. This is in GNU tar format compressed using the bzip2 compression algorithm. Again like other sections, it is null terminated | | 5 | Signature | The GPG signatures used to certify that the package was built by the organisations or people that are trusted to have supplied or built the package | -*Figure 2: Header Format for Meta Packages* - +*Figure 2: Header Format for Meta Packages:* | Position | Section | Description | | --- | --- | --- | | 0 | Magic | The "magic number" section of the header. Stored in hexidecimal with a null terminator | @@ -40,8 +38,7 @@ of the source and binary package file format is organized like so: | 2 | Metadata | A base64 encoded JSON containing all the metadata required to detail the other packages being required by the meta package | | 3 | Signature | The GPG signatures used to certify that the package was built by the organisations or people that are trusted to have supplied or built the package | -*Figure 3: Header Format for Bundle Packages* - +*Figure 3: Header Format for Bundle Packages:* | Position | Section | Description | | --- | --- | --- | | 0 | Magic | The "magic number" section of the header. Stored in hexidecimal with a null terminator | @@ -51,8 +48,7 @@ of the source and binary package file format is organized like so: | 4 | Payload | This section contains more than one base64 encoded bundled package bundled in a GNU tar archive. The tar archive is null terminated to denote the end of the payload section | | 5 | Signature | The GPG signatures used to certify that the package was built by the organisations or people that are trusted to have supplied or built the package | -*Figure 4: Header Format for Source Packages* - +*Figure 4: Header Format for Source Packages:* | Position | Section | Description | | --- | --- | --- | | 0 | Magic | The "magic number" section of the header. Terminated with a null terminator | @@ -66,7 +62,7 @@ of the source and binary package file format is organized like so: The Magic section allows tools to determine the type and architecture of the binary file. The current magic format in use by LPkg is shown in the table below. The structure of the hexidecimal encoded string is `lpkg:v$VERSION_INT:t$TYPE_ENUM:a$ARCH_ENUM` with colon seperated sections. The leading "lpkg" literal is to clearly denote that this is an archive format for use by LPkgTools and other compatible tools. The `v$VERSION_INT` denotes the version of the file type. The `t$TYPE_ENUM` denotes the type of package, whether binary, meta, bundle, or source. Finally, the `a$ARCH_ENUM` denotes the architecture that the package will install onto. -*Figure 5: Magic Format per Package Type Examples* +*Figure 5: Magic Format per Package Type Example:* | Hex String | ASCII String | Description | | --- | --- | --- | | 0x6c706b673a76313a74613a6131 | lpkg:v1:t2:a1 | An LPkg v1 single-payload binary package for the noarch (installable on all architectures) systems | @@ -84,7 +80,7 @@ The JSON structure of the table-of-contents is organized like so: | `version` | string | The version of the ToC format. Current format is 1.0.0. This uses semver versioning | | `sections` | object | The object that holds the ToC data | -*Figure 7: Type Enum* +*Figure 7: Type Enum:* | Number | Represented Value | | --- | --- | | 1 | Meta Package | @@ -92,7 +88,7 @@ The JSON structure of the table-of-contents is organized like so: | 3 | Multi-payload bundle package | | 4 | Source package | -*Figure 8: Architecture Enum* +*Figure 8: Architecture Enum:* | Number | Represented Value | | --- | --- | | 1 | noarch | @@ -100,7 +96,7 @@ The JSON structure of the table-of-contents is organized like so: | 3 | x86_64 | | 4 | aarch64 | -*Figure 9: The `sections` Object* +*Figure 9: The `sections` Object:* | Key | Type | Mandatory | Description | | --- | --- | --- | --- | | `metadata` | object | true | The JSON object describing the start and end location of the Metadata section of the file | @@ -108,26 +104,58 @@ The JSON structure of the table-of-contents is organized like so: | `payload` | object | false | The JSON object describing the start and end location of the Payload section of the file, if it contains one | | `signatures` | object | true | The JSON object describing the start and end location of the Signature section of the file | -*Figure 10: The `metadata` Object* +*Figure 10: The `metadata` Object:* | Key | Type | Description | | --- | --- | --- | | `start` | integer | Start of the metadata section | | `end` | integer | End of the metadata section. This is the byte count before the null terminator | -*Figure 11: The `checksum` Object* +*Figure 11: The `checksum` Object:* | Key | Type | Description | | --- | --- | --- | | `start` | integer | Start of the checksum section | | `end` | integer | End of the checksum section. This is the byte count before the null terminator | -*Figure 12: The `payload` Object* +*Figure 12: The `payload` Object:* | Key | Type | Description | | --- | --- | --- | | `start` | integer | Start of the payload section | | `end` | integer | End of the payload section. This is the byte count before the null terminator | -*Figure 13: The `signatures` Object* +*Figure 13: The `signatures` Object:* | Key | Type | Description | | --- | --- | --- | | `start` | integer | Start of the signatures section | | `end` | integer | End of the signatures section. This is the byte count before the null terminator | + +*Figure 14: An example ToC JSON for a single payload binary x86_64 package:* * +```json +{ + "$schema": "http://www.altimatos.org/standards/lpkg-toc.schema.json", + "version": "1.0.0", + "type": 2, + "arch": 3, + "sections": { + "metadata": { + "start": 512, + "end": 1024 + }, + "checksum": { + "start": 1026, + "end": 1200 + }, + "payload": { + "start": 1202, + "end": 10501 + }, + "signatures": { + "start": 10503, + "end": 10600 + } + } +} +``` + +*Note: numbers in example are not accurate + +### Metadata Section From 0fdf36a651f5db4aede286add8df46d740733722 Mon Sep 17 00:00:00 2001 From: Gary Greene Date: Sat, 24 Aug 2024 15:02:06 -0400 Subject: [PATCH 4/4] Header section doesn't contain the payload or footer Signed-off-by: Gary Greene --- docs/README.md | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/docs/README.md b/docs/README.md index 86cd46f..9008d50 100644 --- a/docs/README.md +++ b/docs/README.md @@ -17,8 +17,13 @@ or RPM really natively support. ## Package Format -The package format of LPkg is fairly straight-forward. The basic structure has a header, payload, and footer. The header -of the source and binary package file format is organized like so: +The package format of LPkg is fairly straight-forward. The basic structure has a header, payload, and footer. + +- The header contains the various bits of metadata for the package +- The payload contains the content archived in the package +- The footer of the package contains the GPG signatures for the package + +The header of the source and binary package file format is organized like so: *Figure 1: Header Format for Source and Binary Files:* | Position | Section | Description | @@ -27,8 +32,6 @@ of the source and binary package file format is organized like so: | 1 | ToC | The table of contents of the file. This is stored as a base64 encoded JSON string with a null terminator at the end of the section | | 2 | Metadata | A base64 encoded JSON representation of the package's metadata with a null terminator at the end of the section | | 3 | Checksum | A base64 encoded JSON representation of the SHA512 checksum of the payload content with a null terminator at the end of the section | -| 4 | Payload | The base64 encoded payload of the package. This is in GNU tar format compressed using the bzip2 compression algorithm. Again like other sections, it is null terminated | -| 5 | Signature | The GPG signatures used to certify that the package was built by the organisations or people that are trusted to have supplied or built the package | *Figure 2: Header Format for Meta Packages:* | Position | Section | Description | @@ -36,7 +39,6 @@ of the source and binary package file format is organized like so: | 0 | Magic | The "magic number" section of the header. Stored in hexidecimal with a null terminator | | 1 | ToC | The table of contents of the file. Far more abridged as there is only the two "content" sections of the file compared to the other formats. Base64 encoded and null terminated | | 2 | Metadata | A base64 encoded JSON containing all the metadata required to detail the other packages being required by the meta package | -| 3 | Signature | The GPG signatures used to certify that the package was built by the organisations or people that are trusted to have supplied or built the package | *Figure 3: Header Format for Bundle Packages:* | Position | Section | Description | @@ -45,8 +47,6 @@ of the source and binary package file format is organized like so: | 1 | ToC | The table of contents of the file. As before, JSON data with base64 encoding and null terminated | | 2 | Metadata | A base64 encoded JSON containing the bundle metadata. This does not contain the metadata from the bundled packages in the bundle package | | 3 | Checksum | A base64 encoded JSON with the SHA512 checksums of the GNU tar archive payload. Terminated by a null terminator | -| 4 | Payload | This section contains more than one base64 encoded bundled package bundled in a GNU tar archive. The tar archive is null terminated to denote the end of the payload section | -| 5 | Signature | The GPG signatures used to certify that the package was built by the organisations or people that are trusted to have supplied or built the package | *Figure 4: Header Format for Source Packages:* | Position | Section | Description | @@ -55,8 +55,6 @@ of the source and binary package file format is organized like so: | 1 | ToC | The table of contents of the file. Again, JSON data in base64 encoding and null terminated | | 2 | Metadata | A base64 encoded JSON representing the metadata about the source package. This does not contain the metadata for any to-be-built sub packages that the source package can create. Null terminated | | 3 | Checksum | A base64 encoded JSON representation of the checksum of the file. The hash used is SHA512. Terminated with a null terminator | -| 4 | Payload | This section contains the blueprint file, any patches needed for properly building the package, and any sources needed in a GNU tar bzip2 archive. The section is null terminated to denote the end of the payload section | -| 5 | Signature | The GPG signatures used to certify that the package was created by the organisations or people that are trusted to have supplied or built the package | ### Magic Format