Skip to content

docs: add usage guidelines for systemd targets, services and directives#240

Open
markg-github wants to merge 3 commits into
AMDEPYC:mainfrom
markg-github:docs/systemd-guidelines
Open

docs: add usage guidelines for systemd targets, services and directives#240
markg-github wants to merge 3 commits into
AMDEPYC:mainfrom
markg-github:docs/systemd-guidelines

Conversation

@markg-github
Copy link
Copy Markdown
Contributor

sev-certify now uses systemd targets and "barrier services" as well as the "worker services" that have always been used. This doc provides guidelines for their use and for the use of some related systemd directives.

sev-certify now uses systemd targets and "barrier services" as well as
the "worker services" that have always been used. This doc provides
guidelines for their use and for the use of some related systemd directives.
Copilot AI review requested due to automatic review settings April 29, 2026 18:36
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new documentation page describing how sev-certify uses systemd targets, “barrier” services, and worker services, including guidance on common directives and dependency patterns.

Changes:

  • Added docs/systemd-guidelines.md covering terminology, activation vs ordering vs enrollment, and the target↔barrier pattern.
  • Documented intra-stage ordering approaches and sev-certify-specific guidance for directives like Type, RemainAfterExit, DefaultDependencies, KillMode, and TimeoutStartSec.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/systemd-guidelines.md

# Bootstrapping

Stop targets (guest and host) have WantedBy=multi-user.target. multi-user.target is the "system is ready" terminal target — always present, always reached on normal boot. This is the only use of WantedBy that's required in sev-certify. This "enrollment" is not enough to "activate" the stop targets. Activation of the stop targets requires enabling (or starting) them. Do this via an "enable stop.target" directive in a .preset file.<br>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot everything else can and does use Wants or Requires. The sev-certify stop targets boot strap this process.


## RemainAfterExit

In sev-certify, use `RemainAfterExit=yes` with oneshot services and `RemainAfterExit=no`, the default, with simple services.<br>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot It's possible that the code isn't compliant with the guidelines. The code that I believe you're thinking of here was committed and merged before this PR was opened.

Comment on lines +85 to +88
In sev-certify, use `DefaultDependencies=no`.<br>

`DefaultDependencies=no` allows precise, self-contained placement of a unit in the dependency graph. systemd units in sev-certify aren't standard and default dependencies don't make sense for them.<br>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above. While it's true that guidelines are in flux, the sev-certify code being thought about here may not end up being compliant and isn't compliant with this version/commit of the guidelines.

Comment thread docs/systemd-guidelines.md Outdated
Comment thread docs/systemd-guidelines.md Outdated
Comment on lines +3 to +5
"Targets" (.target files) define/establish "stages", for example, boot, test and report stages.<br>
"Barrier services" are closely related to targets, but allow targets to be decoupled from stage details. "Barriers" and "barrier services" used interchangeably.<br>
"Worker services" are services that we create, the .service file and the service code. "Workers" and "worker services" used interchangeably.<br>
- Wants/Requires<br>
- WantedBy/RequiredBy plus enabling via enable directive .preset file or systemctl enable<br>

For sev-certify, somehow automating systemctl start versus one of the other activation methods doesn't make sense. Also, the "directions" of Wants/Requires and WantedBy/RequiredBy are opposite and it may only be appropriate/correct to change "one side". For example, it's inappropriate to change multi-user.target to have Wants/Requires=`<`one or more sev-certify units`>`. WantedBy/RequiredBy "enrolls" a unit and this plus enabling is one way to activate.<br>
Comment on lines +65 to +66
In sev-certify, use `Type=oneshot` with `RemainAfterExit=yes` when a suitable `TimeoutStartSec` value can be determined. Otherwise, use `Type=simple`.<br>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot to me, RemainAfterExit=yes seems to fit sev-certify better and there doesn't seem to be a downside other than maybe salvaging more of a "bad boot".


`KillMode` controls which processes systemd will kill when a unit is stopped.<br>

For sev-certify, it's better to use `RemainAfterExit=yes` to avoid undesired process killing than to change `KillMode` from control-group, its default.<br>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 1, 2026 18:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new documentation page describing how sev-certify uses systemd targets, “barrier services”, and related directives to structure stage execution and ordering.

Changes:

  • Introduces terminology and an overview of activation vs ordering vs enrollment in systemd for sev-certify.
  • Documents the “target → barrier → worker” pattern and approaches for intra-stage ordering.
  • Provides sev-certify-specific guidance for directives like Type, RemainAfterExit, DefaultDependencies, KillMode, and TimeoutStartSec.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Wants/Requires<br>
- WantedBy/RequiredBy plus enabling via enable directive .preset file or systemctl enable<br>

For sev-certify, somehow automating systemctl start versus one of the other activation methods doesn't make sense. Also, the "directions" of Wants/Requires and WantedBy/RequiredBy are opposite and it may only be appropriate/correct to change "one side". For example, it's inappropriate to change multi-user.target to have Wants/Requires=`<`one or more sev-certify units`>`. WantedBy/RequiredBy "enrolls" a unit and this plus enabling is one way to activate.<br>
Comment on lines +39 to +52
Each barrier service Requires= and After= all of its worker services. A barrier does have an ExecStart, but the simplest, most natural way for a barrier to stay in sync with its workers is to After= all of the workers. For example, in guest report-done.service:<br>

Requires=display-guest-logs.service sev-certificate-generator.service<br>
After=display-guest-logs.service sev-certificate-generator.service<br>

# Intra-stage ordering

In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br>

1) have a oneshot service use a non-systemd mechanism to tell when the non-oneshot is done and use After= with this oneshot service or
2) use OnSuccess (and OnFailure?).

An example of 1) is the verify-guest service (Type=oneshot) checking logs to determine when the launch-guest service (Type=simple) is done.<br>

Comment on lines +65 to +76
In sev-certify, use `Type=oneshot` with `RemainAfterExit=yes` when a suitable `TimeoutStartSec` value can be determined. Otherwise, use `Type=simple`.<br>

You can't easily use Before/After with simple services since they satisfy Before/After as soon as they start. See intra-stage ordering above. With oneshot services, Before/After isn't satisfied until the main process exits.<br>

With oneshot services, `TimeoutStartSec` is how long the main process has to exit/finish before systemd kills it. This can affect subprocesses and whether it does depends on `RemainAfterExit` and `KillMode` directives.<br>

default: simple<br>

## RemainAfterExit

In sev-certify, use `RemainAfterExit=yes` with oneshot services and `RemainAfterExit=no`, the default, with simple services.<br>

Comment on lines +85 to +88
In sev-certify, use `DefaultDependencies=no`.<br>

`DefaultDependencies=no` allows precise, self-contained placement of a unit in the dependency graph. systemd units in sev-certify aren't standard and default dependencies don't make sense for them.<br>

In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br>

1) have a oneshot service use a non-systemd mechanism to tell when the non-oneshot is done and use After= with this oneshot service or
2) use OnSuccess (and OnFailure?).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this suggestion

Copy link
Copy Markdown
Contributor

@DGonzalezVillal DGonzalezVillal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Mark sorry for taking so long to look at this.

Here are some of my comments on your guide!

OVerall I like it a lot, just wanted to discuss some accuracy points on your draft.

Thank you!

- Wants/Requires<br>
- WantedBy/RequiredBy plus enabling via enable directive .preset file or systemctl enable<br>

For sev-certify, somehow automating systemctl start versus one of the other activation methods doesn't make sense. Also, the "directions" of Wants/Requires and WantedBy/RequiredBy are opposite and it may only be appropriate/correct to change "one side". For example, it's inappropriate to change multi-user.target to have Wants/Requires=`<`one or more sev-certify units`>`. WantedBy/RequiredBy "enrolls" a unit and this plus enabling is one way to activate.<br>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the "directions" of Wants/Requires and WantedBy/RequiredBy are opposite and it may only be appropriate/correct to change "one side".

By this do you mean that we should only be having the relationship defined one way, correct? What I mean is that if unit A requires unit B. We would define in systemd either

  • (In A) Requires=B.service
  • (in B) RequiredBy=A.service

But we shouldn't have both? I don't disagree with this statement.

Maybe we should clarify which one is preferred. from my understanding you're saying here we probably prefer RequiredBy.


## Value of having both targets and "barrier services"

(straight from Claude Code)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(straight from Claude Code)

It's ok, what isn't this days :)

Comment on lines +25 to +26
1. Stage chaining stays stable — targets give each stage a named boundary that subsequent stages reference. As workers are added or removed from a stage, only the barrier changes (Requires=); the target and the chain above it are untouched.
2. Intra-stage ordering without coupling — when workers within a stage must run in sequence, a started barrier gives them a common synchronization point without workers needing to reference each other directly. Without the barrier, you'd have to wire workers to each other, coupling units that conceptually belong to the same stage independently.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason for using barrier services is to make failure handling more explicit and resilient. If a target directly depends on all of its worker services, a single worker failure can prevent the target from being reached and stop the rest of the flow. With a barrier, the target can remain the stable stage boundary, while the barrier owns the responsibility of running the stage services, collecting their status, and deciding how failure is represented. This allows later stages, especially reporting, to still be reached so failures can be captured instead of skipping directly to shutdown or leaving the run incomplete.


# Intra-stage ordering

In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we need the explanation, but the reason After= does not behave as expected with non-oneshot services is that After= only waits for initialization, not completion. Since long-running services do not have a completion state, once the service initializes successfully, systemd considers it safe to start dependent services.

Comment on lines +67 to +73
In sev-certify, use `Type=oneshot` with `RemainAfterExit=yes` when a suitable `TimeoutStartSec` value can be determined. Otherwise, use `Type=simple`.<br>

You can't easily use Before/After with simple services since they satisfy Before/After as soon as they start. See intra-stage ordering above. With oneshot services, Before/After isn't satisfied until the main process exits.<br>

With oneshot services, `TimeoutStartSec` is how long the main process has to exit/finish before systemd kills it. This can affect subprocesses and whether it does depends on `RemainAfterExit` and `KillMode` directives.<br>

default: simple<br>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments on this section.

First, I don't think TimeoutStartSec is necessarily the right criteria to decide between Type=oneshot and Type=simple. None of our current services rely on TimeoutStartSec, and I don't think service type selection should be driven by whether we can determine a timeout value.

Instead, I think the recommendation should be based on service behavior. For most of our services, Type=oneshot is the better fit because they are intended to perform finite work and run to completion once. I expect that to apply to the majority of services in this project.

Also, the main benefit of Type=oneshot in our architecture is ordering semantics. After= and Before= only wait for service activation, not completion. For long-running services (Type=simple), dependencies are satisfied as soon as the service initializes successfully. With Type=oneshot, ordering is only satisfied once the main process exits, which aligns better with our stage-based execution model.

RemainAfterExit=yes serves a different purpose. It keeps the service in the active state after execution has completed, allowing the unit to represent that a stage has finished. That is why barrier services use it by definition—they act as completion markers for all work associated with that target.

Without RemainAfterExit=yes, barrier services immediately transition to inactive after completion, which can lead to them being retriggered when revisiting target dependencies later in the flow.


In sev-certify, use `RemainAfterExit=yes` with oneshot services and `RemainAfterExit=no`, the default, with simple services.<br>

`RemainAfterExit` has the same semantics for oneshot and simple services. `RemainAfterExit=no` (default) means the service will stop when the main process exits. `RemainAfterExit=yes` means the service will stay active after the main process exits.<br>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`RemainAfterExit` has the same semantics for oneshot and simple services. `RemainAfterExit=no` (default) means the service will stop when the main process exits. `RemainAfterExit=yes` means the service will stay active after the main process exits.<br>
`RemainAfterExit` has the same semantics for oneshot and simple services. `RemainAfterExit=no` (default) means the service will become inactive when the main process exits. `RemainAfterExit=yes` means the service will stay active after the main process exits.<br>

Comment on lines +93 to +95
## KillMode

`KillMode` controls which processes systemd will kill when a unit is stopped.<br>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example of where we use KillMode?


## TimeoutStartSec

See above. Also, `TimeoutStartSec=infinity` is how to express no timeout.<br>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing, I don't know if we use this at all

In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br>

1) have a oneshot service use a non-systemd mechanism to tell when the non-oneshot is done and use After= with this oneshot service or
2) use OnSuccess (and OnFailure?).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this suggestion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants