doc/plans: Update OCI sealing spec (kernel sigs, flattened layers) by cgwalters · Pull Request #224 · composefs/composefs-rs

cgwalters · 2026-02-05T19:05:16Z

Two big goals:

Support for kernel-native fsverity signatures to be associated with a digest
Generalize the "flattened vs per-layer digest"; any layer can now have either.

doc/plans/oci-sealing-spec.md

allisonkarlitskaya

I started reviewing this before I noticed how large it was. A few comments from the first part...

doc/plans/oci-sealing-impl.md

allisonkarlitskaya · 2026-02-09T06:57:36Z

doc/plans/oci-sealing-impl.md

+### 1. Algorithm string format

-The sealing workflow in composefs-rs begins with `create_filesystem()` building the filesystem from OCI layers. Layer tar streams are imported via `import_layer()`, converting them to composefs split streams. Files 64 bytes or smaller are stored inline in the split stream, while larger files are stored in the object store with fsverity digests. Layers are processed in order, applying overlayfs semantics including whiteout handling (`.wh.` files). Hardlinks are tracked properly across layers to maintain filesystem semantics.
+The spec defines `${DIGEST}-${BLOCKSIZEBITS}` identifiers (e.g. `sha512-12`). Need to implement parsing and mapping to kernel constants (`FS_VERITY_HASH_ALG_SHA512`, 4096-byte blocks, no salt). This is a prerequisite for everything else.


I'm a proponent of the alg-blkbits approach, but maybe we also want a prefix like fsverity- to make sure people understand that it's not straight-up sha512 or not even a straight-up merkel tree. If we did do an fsverity prefix we could use the algorithm number instead, like fsverity-2-12. Just a bit of bikeshedding.

doc/plans/oci-sealing-impl.md

allisonkarlitskaya · 2026-02-09T07:03:34Z

doc/plans/oci-sealing-impl.md

+### 3. Persist manifest and config as regular files

-Two-level naming allows access by fsverity digest (verified) or by ref name (unverified). The `ensure_stream()` method provides idempotent stream creation with SHA256-based deduplication. Streams can reference other streams via digest maps stored in split stream headers, enabling the layer→config relationship tracking.
+The manifest is currently not persisted at all (fetched, parsed, discarded in `skopeo.rs`). The config is stored as splitstream via `write_config()`. Both need to be stored as regular files so `FS_IOC_ENABLE_VERITY` can be called on them.


We definitely don't need to write a file to the disk in order to calculate its fs-verity digest. What's the advantage of doing this? Will you continue to also store the splitstream? If not, GC is going to get a lot more complicated and also slower...

We definitely don't need to write a file to the disk in order to calculate its fs-verity digest. What's the advantage of doing this?

What I want to do is support "read + mount a container image with strict IPE enabled" and to do that we ideally have all of the metadata covered by fsverity as well.

That said, I think most policies are mainly concerned with denying execute-unsigned, not read-unsigned (as that gets obviously hard).

But it felt appealing to me to be able to say that the manifest and config are also just signed-fsverity files.

Or to say it differently that in theory one could omit e.g. cosign covering an image - just a composefs-signature artifact is enough as well to verify a complete image.

Will you continue to also store the splitstream? If not, GC is going to get a lot more complicated and also slower...

Yes, see https://github.com/containers/composefs-rs/pull/216/changes#diff-202658a04902f3f6a4578fa63cf9f6111413c5acfbcccd556631a9a2135dd3ffR163

That's all we need to do right?

Yes, I mean that should work just fine. I'm not sure what the value of IPE for "just data" here is though, since the kernel won't block that, and since we already have (and check) the canonical identifier (ie: sha256 content hash). fsverity (and dm-verity) are merkle trees because they're about protecting sparse access to large files without having to hash the entire thing at the start, which doesn't really apply to a JSON document...

Yes, I mean that should work just fine. I'm not sure what the value of IPE for "just data"

The way I was thinking about this more is from the "image integrity" angle and less about IPE specifically. It relates to the topic you brought up below.

If we have a kernel-fsverity signature that directly covers the manifest then "for free" we get verification when we read the manifest that it is valid.

And the manifest + config define things like the layer ordering. So unless a runtime validated some other signature (such as cosign) on startup it would still be possible to swap image layers around.

I guess backing up a bit though, even with this though someone could e.g. replace the logical tag for a floating quay.io/someorg/somecontainer with some other image on disk.

This relates to something pwithnall did for ostree in including bindings in the (signed) commit. In theory we could add an OCI extension that included the (or multiple) image name in the manifest as an annotation, and a runtime could validate then when it goes to run an image with that name that the app matches. I think DDIs basically have some of this because the os-release field can contain a name.
(And of course arguably...we could add container image tags into /usr/lib/os-release too...)

Anyways though, yes, kernel-fsverity signature on the manifest/config is not required under all threat models, but since it seems easy to do and (AFAIK) there's no downside, I'd like to do it.

I guess though backing up...there's really two cases.

The system configuration pins an image by explicit (manifest|config) digest

The there's a floating tag

For sure with 1) I don't think we need the kernel-fsverity signature for the manifest, config, it is indeed just "find image named by manifest|config digest", verify sha digest of object, then mount merged erofs with kernel-fsverity signature in use.

Of course, this whole thing is predicated a bit on "how do I verify the config which specifies that quay.io/foo/bar@sha256:... - for bootc LBIs that's very clear, it's covered by the UKI -> composefs-for-root. Other cases might use something like signed confexts (which I'd like to support being OCI+composefs too).

For floating tags, yeah we can't really defend against image swapping "offline" without strengthening what gets signed.

That said, for use cases like Kubernetes, there is an option for kubelet to re-ping the registry for images on startup for a reason related to this a bit - validating that a user can pull an image even it happens to be on disk.

In the end I just come back to this:

since it seems easy to do and (AFAIK) there's no downside, I'd like to do it.

doc/plans/oci-sealing-impl.md

allisonkarlitskaya · 2026-02-09T09:50:23Z

doc/plans/oci-sealing-spec.md

+
+### Signatures
+
+#### Linux kernel fsverity signatures (recommended)


This is a very significant departure from the way the trust model works now and how the object store works in general. From what I understand, each file can only have a single fs-verity signature on it, but we store objects by their content hash, which means if we had two objects enter the object store from differently-signed containers, we'd be in trouble, no?

Is this for every object or just the erofs image?

I'm also not sure that kernel-level fsverity signatures provide very strong protection (at the level that would be provided by signatures on disk images, for example) because they are on a file-by-file basis, assuming that is the intent here. If you ignore the userspace stuff, you could probably still use your ability to freely mix-and-match various individually-signed files into a system configuration that let's you do "bad things"...

From what I understand, each file can only have a single fs-verity signature on it, but we store objects by their content hash, which means if we had two objects enter the object store from differently-signed containers, we'd be in trouble, no?

This is a good point - however it's interesting as I think it's more of an implementation concern and not a spec concern.

Is this for every object or just the erofs image?

Yes exactly: we don't require fsverity signatures on individual objects (that form part of a split layer tarball). See this issue in a nutshell the goal is that having the fsverity signature on the EROFS blob + detecting overlay require-verity should be sufficient for chain-of-trust.

But that said, what would happen if e.g. two distinct images shared a layer? I think "first one wins" is sufficient for the rootful case. The Linux kernel fsverity signature mechanism only has one keyring which applies to everything, and we can't do anything different. In a future world where there's e.g. per-user keyrings or so...it would just preclude sharing the EROFS metadata blob between trust domains (root and rootless e.g.) right? We could still share the underlying layer objects via hardlinks.

If you ignore the userspace stuff, you could probably still use your ability to freely mix-and-match various individually-signed files into a system configuration that let's you do "bad things"...

Again not individual signed files, but only complete layers. I don't think think this is any different from e.g. dm-verity + IPE. I wouldn't dismiss this concern entirely, but it seems like it'd be quite difficult offhand in practice to craft such a chain.

Now that said one obvious thing even with layers is that this doesn't protect against e.g. rollback attacks. I think that type of thing needs to be out of scope of this spec.

I was thinking about this a bit more, and I think there is still an argument that we should support inline digests in the manifest (or config). It would naturally mean that we can reliably chain from "trust in manifest" to "trust in mounted root", which was again the original goal.

For cases where "trust in manifest" is implicitly handled (e.g. kubernetes API server tells us to run an image with this particular digest) it would Just Work as long as fsverity is supported by the underlying filesystem.

This support would give us an generic "out" for rootless/unprivileged use. (Though honestly in the medium term it'd be clearly nice to enhance the linux kernel fsverity to at least do something like "allow per-user keyrings for files owned by that user" or so)

But that said, what would happen if e.g. two distinct images shared a layer? I think "first one wins" is sufficient for the rootful case. The Linux kernel fsverity signature mechanism only has one keyring which applies to everything, and we can't do anything different. In a future world where there's e.g. per-user keyrings or so...it would just preclude sharing the EROFS metadata blob between trust domains (root and rootless e.g.) right? We could still share the underlying layer objects via hardlinks.

could you please elaborate on this point? For the runtime side, why would it be a problem as long as there is at least one accepted signature?

I agree, "one accepted signature" is fine - we can't do anything different today because the Linux kernel's fsverity mechanism only allows the same.

The biggest goal here is support for Linux kernel-native fsverity signatures to be attached to layers, which enables integration with IPE. Add support for a fully separate OCI "composefs signature" artifact which can be attached to an image. Drop the -impl.md doc...it's not useful to try to write this stuff in markdown. The spec has some implementation considerations, but it's easier to look at implementation side from a code draft. Add standardized-erofs-meta.md as a placeholder document outlining the goal of standardizing composefs EROFS serialization across implementations (canonical model: tar -> dumpfile -> EROFS). Assisted-by: OpenCode (Claude Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>

cgwalters · 2026-02-12T13:46:27Z

doc/plans/standardized-erofs-meta.md

@@ -0,0 +1,74 @@
+# Standardized EROFS Metadata Serialization


When talking about creating a specification for the sealing, of course this heavily depends on a spec for the EROFS layout, which pulls back in all the debate in composefs/composefs#198

Now... #225 is starting to look at what it'd take to have us support being bit-for-bit compatible with the previous composefs-c (1.0) format.

In an ideal world perhaps we teach mkfs.erofs how to generate this too? This also relates a bit to uapi-group/specifications#207

hi @cgwalters, do you mean the erofs metadata arrangement or the sealing format?
For the sealing format, I'm fine to get any help as long as anyone has interest to port this to erofs-utils, it can be used to improve the interaction between composefs tools and erofs-utils.

As for the erofs metadata itself, I don't think erofs-utils should strictly align with mkcomposefs (just because erofs-utils itself already have different arrangement for different cases, but erofs-utils is always designed reproduciblely), also I think it sounds unnecessary since erofs metadata layout is flexible enough (yet composefs can definitely define a strict on-disk layout for all related stuffs.)

But my own TODO list is already overloaded, so I don't help on some practical development on this.

but erofs-utils is always designed reproduciblely)

Only within a specific binary version, right? You don't guarantee that a future mkfs.erofs wouldn't generate a different metadata layout, correct?

but erofs-utils is always designed reproduciblely)

Only within a specific binary version, right? You don't guarantee that a future mkfs.erofs wouldn't generate a different metadata layout, correct?

Yes, of course. but erofs-utils layout won't be frequently changed in the foreseen future I think.

Right, so a very important thing that we're trying to do with composefs+OCI is not change the wire format for OCI - we're not shipping EROFS on the wire, only generating it reproducibly on the client and server.

Hence we must:

Lock in the bit-for-bit file format basically forever

Have solid tooling to generate it (and ideally that tooling is easily accessible from multiple programming languages)

just one note: currently tar format also doesn't strictly the header/file order for example and various tools generate various tars, so I guess reproduciblely within a specific binary version is also fine as long as users can reproduce it with the tools/command line and the exact version.

I respect your choices but I don't see a manifest erofs metadata on the wire sounds inappropriate from whatever point of view (of course you could lock in the bit-for-bit file format forever.)

just one note: currently tar format also doesn't strictly the header/file order for example and various tools generate various tars, so I guess reproduciblely within a specific binary version is also fine as long as users can reproduce it with the tools/command line and the exact version.

Yes, this is a reason why the doc text here is proposing only a canonical mapping from composefs dumpfile ➡️ EROFS.

A composefs dumpfile has less representational flexibility (e.g. paths always start with /, xattrs are always sorted and serialized in band) I will actually try to ensure we have a "canonical dumpfile" format which should help.

I respect your choices but I don't see a manifest erofs metadata on the wire sounds inappropriate from whatever point of view (of course you could lock in the bit-for-bit file format forever.)

Hmmm. It is an interesting design choice; now that we've moved the signatures to a separate OCI artifact, indeed we could try to create a design where the EROFS-metadata is actually stored on the registry too.

However...it introduces the same "representational ambiguity" in that we'd be shipping both tar and metadata-EROFS and have to answer the question of: what happens when they disagree? This problem is one argument why zstd:chunked is still always validating against the diffid for security reasons.

But OTOH with composefs here we aren't actually trying to optimize incremental fetches, so we're always parsing the whole tarball anyways and could actually do something like:

fetch erofs-meta from detached composefs OCI signature

fetch layers

For each layer, parse the tarball and generate a canonical in-memory metadata representation (like a composefs dumpfile) and then compare it with the erofs-meta: if they differ, it's a fatal error

Hummm.....yes, I think such an approach would allow us to entirely punt on the problem domain of standardizing an EROFS layout, but at the cost of duplicating all of the metadata.

Is that the right call? I am...unsure.

I don't have the answer: I could find some technical ways, yet when it comes to OCI world, it really su*ks...

Right now the "sealed UKI" mode is relying on the bit-for-bit EROFS reproducibility, which argues for standardizing it.

But if we went all in on this external fsverity signature mechanism, then I think we could (per discussion) also convert basically all use cases for sealed UKIs to use it as well. (After some design work)

cgwalters mentioned this pull request Feb 5, 2026

integration with IPE tracker composefs/composefs#360

Open

cgwalters commented Feb 6, 2026

View reviewed changes

doc/plans/oci-sealing-spec.md Outdated Show resolved Hide resolved

cgwalters mentioned this pull request Feb 6, 2026

Initial reimplementation of composefs-c #225

Draft

cgwalters force-pushed the spec-enhancements branch from a404b59 to 2a23a13 Compare February 7, 2026 02:25

cgwalters mentioned this pull request Feb 8, 2026

WIP: Rough draft for updated generic OCI sealing #226

Draft

allisonkarlitskaya reviewed Feb 9, 2026

View reviewed changes

cgwalters force-pushed the spec-enhancements branch from 2a23a13 to fa8852b Compare February 12, 2026 01:50

cgwalters force-pushed the spec-enhancements branch from fa8852b to b470f73 Compare February 12, 2026 02:08

cgwalters marked this pull request as ready for review February 12, 2026 13:39

cgwalters commented Feb 12, 2026

View reviewed changes


		### Signatures

		#### Linux kernel fsverity signatures (recommended)

Conversation

cgwalters commented Feb 5, 2026

Uh oh!

Uh oh!

allisonkarlitskaya left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsiangkao Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsiangkao Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hsiangkao Feb 12, 2026 •

edited

Loading

hsiangkao Feb 12, 2026 •

edited

Loading