docs: add usage guidelines for systemd targets, services and directives by markg-github · Pull Request #240 · AMDEPYC/sev-certify

markg-github · 2026-04-29T18:36:56Z

sev-certify now uses systemd targets and "barrier services" as well as the "worker services" that have always been used. This doc provides guidelines for their use and for the use of some related systemd directives.

Copilot

Pull request overview

Adds a new documentation page describing how sev-certify uses systemd targets, “barrier” services, and worker services, including guidance on common directives and dependency patterns.

Changes:

Added docs/systemd-guidelines.md covering terminology, activation vs ordering vs enrollment, and the target↔barrier pattern.
Documented intra-stage ordering approaches and sev-certify-specific guidance for directives like Type, RemainAfterExit, DefaultDependencies, KillMode, and TimeoutStartSec.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

markg-github · 2026-05-01T18:00:24Z

+
+# Bootstrapping
+
+Stop targets (guest and host) have WantedBy=multi-user.target. multi-user.target is the "system is ready" terminal target — always present, always reached on normal boot. This is the only use of WantedBy that's required in sev-certify. This "enrollment" is not enough to "activate" the stop targets. Activation of the stop targets requires enabling (or starting) them. Do this via an "enable stop.target" directive in a .preset file.<br>


@copilot everything else can and does use Wants or Requires. The sev-certify stop targets boot strap this process.

markg-github · 2026-05-01T18:03:59Z

+
+## RemainAfterExit
+
+In sev-certify, use `RemainAfterExit=yes` with oneshot services and `RemainAfterExit=no`, the default, with simple services.<br>


@copilot It's possible that the code isn't compliant with the guidelines. The code that I believe you're thinking of here was committed and merged before this PR was opened.

markg-github · 2026-05-01T18:06:30Z

+In sev-certify, use `DefaultDependencies=no`.<br>
+
+`DefaultDependencies=no` allows precise, self-contained placement of a unit in the dependency graph. systemd units in sev-certify aren't standard and default dependencies don't make sense for them.<br>
+


same comment as above. While it's true that guidelines are in flux, the sev-certify code being thought about here may not end up being compliant and isn't compliant with this version/commit of the guidelines.

+"Targets" (.target files) define/establish "stages", for example, boot, test and report stages.<br>
+"Barrier services" are closely related to targets, but allow targets to be decoupled from stage details. "Barriers" and "barrier services" used interchangeably.<br>
+"Worker services" are services that we create, the .service file and the service code. "Workers" and "worker services" used interchangeably.<br>


+- Wants/Requires<br>
+- WantedBy/RequiredBy plus enabling via enable directive .preset file or systemctl enable<br>
+
+For sev-certify, somehow automating systemctl start versus one of the other activation methods doesn't make sense. Also, the "directions" of Wants/Requires and WantedBy/RequiredBy are opposite and it may only be appropriate/correct to change "one side". For example, it's inappropriate to change multi-user.target to have Wants/Requires=`<`one or more sev-certify units`>`. WantedBy/RequiredBy "enrolls" a unit and this plus enabling is one way to activate.<br>


markg-github · 2026-05-01T18:12:22Z

+In sev-certify, use `Type=oneshot` with `RemainAfterExit=yes` when a suitable `TimeoutStartSec` value can be determined. Otherwise, use `Type=simple`.<br>
+


@copilot to me, RemainAfterExit=yes seems to fit sev-certify better and there doesn't seem to be a downside other than maybe salvaging more of a "bad boot".

+
+`KillMode` controls which processes systemd will kill when a unit is stopped.<br>
+
+For sev-certify, it's better to use `RemainAfterExit=yes` to avoid undesired process killing than to change `KillMode` from control-group, its default.<br>


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds a new documentation page describing how sev-certify uses systemd targets, “barrier services”, and related directives to structure stage execution and ordering.

Changes:

Introduces terminology and an overview of activation vs ordering vs enrollment in systemd for sev-certify.
Documents the “target → barrier → worker” pattern and approaches for intra-stage ordering.
Provides sev-certify-specific guidance for directives like Type, RemainAfterExit, DefaultDependencies, KillMode, and TimeoutStartSec.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+- Wants/Requires<br>
+- WantedBy/RequiredBy plus enabling via enable directive .preset file or systemctl enable<br>
+
+For sev-certify, somehow automating systemctl start versus one of the other activation methods doesn't make sense. Also, the "directions" of Wants/Requires and WantedBy/RequiredBy are opposite and it may only be appropriate/correct to change "one side". For example, it's inappropriate to change multi-user.target to have Wants/Requires=`<`one or more sev-certify units`>`. WantedBy/RequiredBy "enrolls" a unit and this plus enabling is one way to activate.<br>


+Each barrier service Requires= and After= all of its worker services. A barrier does have an ExecStart, but the simplest, most natural way for a barrier to stay in sync with its workers is to After= all of the workers. For example, in guest report-done.service:<br>
+
+Requires=display-guest-logs.service sev-certificate-generator.service<br>
+After=display-guest-logs.service sev-certificate-generator.service<br>
+
+# Intra-stage ordering
+
+In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br>
+
+1) have a oneshot service use a non-systemd mechanism to tell when the non-oneshot is done and use After= with this oneshot service or 
+2) use OnSuccess (and OnFailure?). 
+
+An example of 1) is the verify-guest service (Type=oneshot) checking logs to determine when the launch-guest service (Type=simple) is done.<br>
+


+In sev-certify, use `Type=oneshot` with `RemainAfterExit=yes` when a suitable `TimeoutStartSec` value can be determined. Otherwise, use `Type=simple`.<br>
+
+You can't easily use Before/After with simple services since they satisfy Before/After as soon as they start. See intra-stage ordering above. With oneshot services, Before/After isn't satisfied until the main process exits.<br>
+
+With oneshot services, `TimeoutStartSec` is how long the main process has to exit/finish before systemd kills it. This can affect subprocesses and whether it does depends on `RemainAfterExit` and `KillMode` directives.<br>
+
+default: simple<br>
+
+## RemainAfterExit
+
+In sev-certify, use `RemainAfterExit=yes` with oneshot services and `RemainAfterExit=no`, the default, with simple services.<br>
+


+In sev-certify, use `DefaultDependencies=no`.<br>
+
+`DefaultDependencies=no` allows precise, self-contained placement of a unit in the dependency graph. systemd units in sev-certify aren't standard and default dependencies don't make sense for them.<br>
+


DGonzalezVillal · 2026-05-20T21:14:05Z

+In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br>
+
+1) have a oneshot service use a non-systemd mechanism to tell when the non-oneshot is done and use After= with this oneshot service or 
+2) use OnSuccess (and OnFailure?). 


I like this suggestion

DGonzalezVillal

Hey Mark sorry for taking so long to look at this.

Here are some of my comments on your guide!

OVerall I like it a lot, just wanted to discuss some accuracy points on your draft.

Thank you!

DGonzalezVillal · 2026-05-20T20:25:01Z

+- Wants/Requires<br>
+- WantedBy/RequiredBy plus enabling via enable directive .preset file or systemctl enable<br>
+
+For sev-certify, somehow automating systemctl start versus one of the other activation methods doesn't make sense. Also, the "directions" of Wants/Requires and WantedBy/RequiredBy are opposite and it may only be appropriate/correct to change "one side". For example, it's inappropriate to change multi-user.target to have Wants/Requires=`<`one or more sev-certify units`>`. WantedBy/RequiredBy "enrolls" a unit and this plus enabling is one way to activate.<br>


Also, the "directions" of Wants/Requires and WantedBy/RequiredBy are opposite and it may only be appropriate/correct to change "one side".

By this do you mean that we should only be having the relationship defined one way, correct? What I mean is that if unit A requires unit B. We would define in systemd either

(In A) Requires=B.service

(in B) RequiredBy=A.service

But we shouldn't have both? I don't disagree with this statement.

Maybe we should clarify which one is preferred. from my understanding you're saying here we probably prefer RequiredBy.

DGonzalezVillal · 2026-05-20T20:28:13Z

+
+## Value of having both targets and "barrier services"
+
+(straight from Claude Code)


Suggested change

(straight from Claude Code)

It's ok, what isn't this days :)

DGonzalezVillal · 2026-05-20T20:47:52Z

+1. Stage chaining stays stable — targets give each stage a named boundary that subsequent stages reference. As workers are added or removed from a stage, only the barrier changes (Requires=); the target and the chain above it are untouched.
+2. Intra-stage ordering without coupling — when workers within a stage must run in sequence, a started barrier gives them a common synchronization point without workers needing to reference each other directly. Without the barrier, you'd have to wire workers to each other, coupling units that conceptually belong to the same stage independently.


Another reason for using barrier services is to make failure handling more explicit and resilient. If a target directly depends on all of its worker services, a single worker failure can prevent the target from being reached and stop the rest of the flow. With a barrier, the target can remain the stable stage boundary, while the barrier owns the responsibility of running the stage services, collecting their status, and deciding how failure is represented. This allows later stages, especially reporting, to still be reached so failures can be captured instead of skipping directly to shutdown or leaving the run incomplete.

DGonzalezVillal · 2026-05-20T20:57:04Z

+
+# Intra-stage ordering
+
+In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br>


I don't know if we need the explanation, but the reason After= does not behave as expected with non-oneshot services is that After= only waits for initialization, not completion. Since long-running services do not have a completion state, once the service initializes successfully, systemd considers it safe to start dependent services.

DGonzalezVillal · 2026-05-20T21:11:31Z

+In sev-certify, use `Type=oneshot` with `RemainAfterExit=yes` when a suitable `TimeoutStartSec` value can be determined. Otherwise, use `Type=simple`.<br>
+
+You can't easily use Before/After with simple services since they satisfy Before/After as soon as they start. See intra-stage ordering above. With oneshot services, Before/After isn't satisfied until the main process exits.<br>
+
+With oneshot services, `TimeoutStartSec` is how long the main process has to exit/finish before systemd kills it. This can affect subprocesses and whether it does depends on `RemainAfterExit` and `KillMode` directives.<br>
+
+default: simple<br>


I have a few comments on this section.

First, I don't think TimeoutStartSec is necessarily the right criteria to decide between Type=oneshot and Type=simple. None of our current services rely on TimeoutStartSec, and I don't think service type selection should be driven by whether we can determine a timeout value.

Instead, I think the recommendation should be based on service behavior. For most of our services, Type=oneshot is the better fit because they are intended to perform finite work and run to completion once. I expect that to apply to the majority of services in this project.

Also, the main benefit of Type=oneshot in our architecture is ordering semantics. After= and Before= only wait for service activation, not completion. For long-running services (Type=simple), dependencies are satisfied as soon as the service initializes successfully. With Type=oneshot, ordering is only satisfied once the main process exits, which aligns better with our stage-based execution model.

RemainAfterExit=yes serves a different purpose. It keeps the service in the active state after execution has completed, allowing the unit to represent that a stage has finished. That is why barrier services use it by definition—they act as completion markers for all work associated with that target.

Without RemainAfterExit=yes, barrier services immediately transition to inactive after completion, which can lead to them being retriggered when revisiting target dependencies later in the flow.

DGonzalezVillal · 2026-05-20T21:12:39Z

+
+In sev-certify, use `RemainAfterExit=yes` with oneshot services and `RemainAfterExit=no`, the default, with simple services.<br>
+
+`RemainAfterExit` has the same semantics for oneshot and simple services. `RemainAfterExit=no` (default) means the service will stop when the main process exits. `RemainAfterExit=yes` means the service will stay active after the main process exits.<br>


Suggested change

`RemainAfterExit` has the same semantics for oneshot and simple services. `RemainAfterExit=no` (default) means the service will stop when the main process exits. `RemainAfterExit=yes` means the service will stay active after the main process exits.<br>

`RemainAfterExit` has the same semantics for oneshot and simple services. `RemainAfterExit=no` (default) means the service will become inactive when the main process exits. `RemainAfterExit=yes` means the service will stay active after the main process exits.<br>

DGonzalezVillal · 2026-05-20T21:13:24Z

+## KillMode
+
+`KillMode` controls which processes systemd will kill when a unit is stopped.<br>


Do you have an example of where we use KillMode?

DGonzalezVillal · 2026-05-20T21:13:45Z

+
+## TimeoutStartSec
+
+See above. Also, `TimeoutStartSec=infinity` is how to express no timeout.<br>


Same thing, I don't know if we use this at all

DGonzalezVillal · 2026-05-20T21:14:05Z

+In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br>
+
+1) have a oneshot service use a non-systemd mechanism to tell when the non-oneshot is done and use After= with this oneshot service or 
+2) use OnSuccess (and OnFailure?). 


I like this suggestion

Copilot AI review requested due to automatic review settings April 29, 2026 18:36

Copilot started reviewing on behalf of markg-github April 29, 2026 18:40 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

markg-github requested a review from DGonzalezVillal April 30, 2026 12:45

fix: fix grammar

a01eca7

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 1, 2026 18:07

Copilot started reviewing on behalf of markg-github May 1, 2026 18:09 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

docs: clarify how to keep worker services in sync

e852e3f

DGonzalezVillal reviewed May 20, 2026

View reviewed changes


		# Bootstrapping

		Stop targets (guest and host) have WantedBy=multi-user.target. multi-user.target is the "system is ready" terminal target — always present, always reached on normal boot. This is the only use of WantedBy that's required in sev-certify. This "enrollment" is not enough to "activate" the stop targets. Activation of the stop targets requires enabling (or starting) them. Do this via an "enable stop.target" directive in a .preset file.<br>


		## RemainAfterExit

		In sev-certify, use `RemainAfterExit=yes` with oneshot services and `RemainAfterExit=no`, the default, with simple services.<br>

		In sev-certify, use `DefaultDependencies=no`.<br>

		`DefaultDependencies=no` allows precise, self-contained placement of a unit in the dependency graph. systemd units in sev-certify aren't standard and default dependencies don't make sense for them.<br>

		In sev-certify, use `Type=oneshot` with `RemainAfterExit=yes` when a suitable `TimeoutStartSec` value can be determined. Otherwise, use `Type=simple`.<br>


		`KillMode` controls which processes systemd will kill when a unit is stopped.<br>

		For sev-certify, it's better to use `RemainAfterExit=yes` to avoid undesired process killing than to change `KillMode` from control-group, its default.<br>


		## Value of having both targets and "barrier services"

		(straight from Claude Code)

		1. Stage chaining stays stable — targets give each stage a named boundary that subsequent stages reference. As workers are added or removed from a stage, only the barrier changes (Requires=); the target and the chain above it are untouched.
		2. Intra-stage ordering without coupling — when workers within a stage must run in sequence, a started barrier gives them a common synchronization point without workers needing to reference each other directly. Without the barrier, you'd have to wire workers to each other, coupling units that conceptually belong to the same stage independently.


		# Intra-stage ordering

		In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br>


		In sev-certify, use `RemainAfterExit=yes` with oneshot services and `RemainAfterExit=no`, the default, with simple services.<br>

		`RemainAfterExit` has the same semantics for oneshot and simple services. `RemainAfterExit=no` (default) means the service will stop when the main process exits. `RemainAfterExit=yes` means the service will stay active after the main process exits.<br>

		## KillMode

		`KillMode` controls which processes systemd will kill when a unit is stopped.<br>


		## TimeoutStartSec

		See above. Also, `TimeoutStartSec=infinity` is how to express no timeout.<br>

Conversation

markg-github commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DGonzalezVillal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants