docs: add usage guidelines for systemd targets, services and directives#240
docs: add usage guidelines for systemd targets, services and directives#240markg-github wants to merge 3 commits into
Conversation
sev-certify now uses systemd targets and "barrier services" as well as the "worker services" that have always been used. This doc provides guidelines for their use and for the use of some related systemd directives.
There was a problem hiding this comment.
Pull request overview
Adds a new documentation page describing how sev-certify uses systemd targets, “barrier” services, and worker services, including guidance on common directives and dependency patterns.
Changes:
- Added
docs/systemd-guidelines.mdcovering terminology, activation vs ordering vs enrollment, and the target↔barrier pattern. - Documented intra-stage ordering approaches and sev-certify-specific guidance for directives like
Type,RemainAfterExit,DefaultDependencies,KillMode, andTimeoutStartSec.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # Bootstrapping | ||
|
|
||
| Stop targets (guest and host) have WantedBy=multi-user.target. multi-user.target is the "system is ready" terminal target — always present, always reached on normal boot. This is the only use of WantedBy that's required in sev-certify. This "enrollment" is not enough to "activate" the stop targets. Activation of the stop targets requires enabling (or starting) them. Do this via an "enable stop.target" directive in a .preset file.<br> |
There was a problem hiding this comment.
@copilot everything else can and does use Wants or Requires. The sev-certify stop targets boot strap this process.
|
|
||
| ## RemainAfterExit | ||
|
|
||
| In sev-certify, use `RemainAfterExit=yes` with oneshot services and `RemainAfterExit=no`, the default, with simple services.<br> |
There was a problem hiding this comment.
@copilot It's possible that the code isn't compliant with the guidelines. The code that I believe you're thinking of here was committed and merged before this PR was opened.
| In sev-certify, use `DefaultDependencies=no`.<br> | ||
|
|
||
| `DefaultDependencies=no` allows precise, self-contained placement of a unit in the dependency graph. systemd units in sev-certify aren't standard and default dependencies don't make sense for them.<br> | ||
|
|
There was a problem hiding this comment.
same comment as above. While it's true that guidelines are in flux, the sev-certify code being thought about here may not end up being compliant and isn't compliant with this version/commit of the guidelines.
| "Targets" (.target files) define/establish "stages", for example, boot, test and report stages.<br> | ||
| "Barrier services" are closely related to targets, but allow targets to be decoupled from stage details. "Barriers" and "barrier services" used interchangeably.<br> | ||
| "Worker services" are services that we create, the .service file and the service code. "Workers" and "worker services" used interchangeably.<br> |
| - Wants/Requires<br> | ||
| - WantedBy/RequiredBy plus enabling via enable directive .preset file or systemctl enable<br> | ||
|
|
||
| For sev-certify, somehow automating systemctl start versus one of the other activation methods doesn't make sense. Also, the "directions" of Wants/Requires and WantedBy/RequiredBy are opposite and it may only be appropriate/correct to change "one side". For example, it's inappropriate to change multi-user.target to have Wants/Requires=`<`one or more sev-certify units`>`. WantedBy/RequiredBy "enrolls" a unit and this plus enabling is one way to activate.<br> |
| In sev-certify, use `Type=oneshot` with `RemainAfterExit=yes` when a suitable `TimeoutStartSec` value can be determined. Otherwise, use `Type=simple`.<br> | ||
|
|
There was a problem hiding this comment.
@copilot to me, RemainAfterExit=yes seems to fit sev-certify better and there doesn't seem to be a downside other than maybe salvaging more of a "bad boot".
|
|
||
| `KillMode` controls which processes systemd will kill when a unit is stopped.<br> | ||
|
|
||
| For sev-certify, it's better to use `RemainAfterExit=yes` to avoid undesired process killing than to change `KillMode` from control-group, its default.<br> |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new documentation page describing how sev-certify uses systemd targets, “barrier services”, and related directives to structure stage execution and ordering.
Changes:
- Introduces terminology and an overview of activation vs ordering vs enrollment in systemd for sev-certify.
- Documents the “target → barrier → worker” pattern and approaches for intra-stage ordering.
- Provides sev-certify-specific guidance for directives like
Type,RemainAfterExit,DefaultDependencies,KillMode, andTimeoutStartSec.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - Wants/Requires<br> | ||
| - WantedBy/RequiredBy plus enabling via enable directive .preset file or systemctl enable<br> | ||
|
|
||
| For sev-certify, somehow automating systemctl start versus one of the other activation methods doesn't make sense. Also, the "directions" of Wants/Requires and WantedBy/RequiredBy are opposite and it may only be appropriate/correct to change "one side". For example, it's inappropriate to change multi-user.target to have Wants/Requires=`<`one or more sev-certify units`>`. WantedBy/RequiredBy "enrolls" a unit and this plus enabling is one way to activate.<br> |
| Each barrier service Requires= and After= all of its worker services. A barrier does have an ExecStart, but the simplest, most natural way for a barrier to stay in sync with its workers is to After= all of the workers. For example, in guest report-done.service:<br> | ||
|
|
||
| Requires=display-guest-logs.service sev-certificate-generator.service<br> | ||
| After=display-guest-logs.service sev-certificate-generator.service<br> | ||
|
|
||
| # Intra-stage ordering | ||
|
|
||
| In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br> | ||
|
|
||
| 1) have a oneshot service use a non-systemd mechanism to tell when the non-oneshot is done and use After= with this oneshot service or | ||
| 2) use OnSuccess (and OnFailure?). | ||
|
|
||
| An example of 1) is the verify-guest service (Type=oneshot) checking logs to determine when the launch-guest service (Type=simple) is done.<br> | ||
|
|
| In sev-certify, use `Type=oneshot` with `RemainAfterExit=yes` when a suitable `TimeoutStartSec` value can be determined. Otherwise, use `Type=simple`.<br> | ||
|
|
||
| You can't easily use Before/After with simple services since they satisfy Before/After as soon as they start. See intra-stage ordering above. With oneshot services, Before/After isn't satisfied until the main process exits.<br> | ||
|
|
||
| With oneshot services, `TimeoutStartSec` is how long the main process has to exit/finish before systemd kills it. This can affect subprocesses and whether it does depends on `RemainAfterExit` and `KillMode` directives.<br> | ||
|
|
||
| default: simple<br> | ||
|
|
||
| ## RemainAfterExit | ||
|
|
||
| In sev-certify, use `RemainAfterExit=yes` with oneshot services and `RemainAfterExit=no`, the default, with simple services.<br> | ||
|
|
| In sev-certify, use `DefaultDependencies=no`.<br> | ||
|
|
||
| `DefaultDependencies=no` allows precise, self-contained placement of a unit in the dependency graph. systemd units in sev-certify aren't standard and default dependencies don't make sense for them.<br> | ||
|
|
| In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br> | ||
|
|
||
| 1) have a oneshot service use a non-systemd mechanism to tell when the non-oneshot is done and use After= with this oneshot service or | ||
| 2) use OnSuccess (and OnFailure?). |
There was a problem hiding this comment.
I like this suggestion
DGonzalezVillal
left a comment
There was a problem hiding this comment.
Hey Mark sorry for taking so long to look at this.
Here are some of my comments on your guide!
OVerall I like it a lot, just wanted to discuss some accuracy points on your draft.
Thank you!
| - Wants/Requires<br> | ||
| - WantedBy/RequiredBy plus enabling via enable directive .preset file or systemctl enable<br> | ||
|
|
||
| For sev-certify, somehow automating systemctl start versus one of the other activation methods doesn't make sense. Also, the "directions" of Wants/Requires and WantedBy/RequiredBy are opposite and it may only be appropriate/correct to change "one side". For example, it's inappropriate to change multi-user.target to have Wants/Requires=`<`one or more sev-certify units`>`. WantedBy/RequiredBy "enrolls" a unit and this plus enabling is one way to activate.<br> |
There was a problem hiding this comment.
Also, the "directions" of Wants/Requires and WantedBy/RequiredBy are opposite and it may only be appropriate/correct to change "one side".
By this do you mean that we should only be having the relationship defined one way, correct? What I mean is that if unit A requires unit B. We would define in systemd either
- (In A) Requires=B.service
- (in B) RequiredBy=A.service
But we shouldn't have both? I don't disagree with this statement.
Maybe we should clarify which one is preferred. from my understanding you're saying here we probably prefer RequiredBy.
|
|
||
| ## Value of having both targets and "barrier services" | ||
|
|
||
| (straight from Claude Code) |
There was a problem hiding this comment.
| (straight from Claude Code) |
It's ok, what isn't this days :)
| 1. Stage chaining stays stable — targets give each stage a named boundary that subsequent stages reference. As workers are added or removed from a stage, only the barrier changes (Requires=); the target and the chain above it are untouched. | ||
| 2. Intra-stage ordering without coupling — when workers within a stage must run in sequence, a started barrier gives them a common synchronization point without workers needing to reference each other directly. Without the barrier, you'd have to wire workers to each other, coupling units that conceptually belong to the same stage independently. |
There was a problem hiding this comment.
Another reason for using barrier services is to make failure handling more explicit and resilient. If a target directly depends on all of its worker services, a single worker failure can prevent the target from being reached and stop the rest of the flow. With a barrier, the target can remain the stable stage boundary, while the barrier owns the responsibility of running the stage services, collecting their status, and deciding how failure is represented. This allows later stages, especially reporting, to still be reached so failures can be captured instead of skipping directly to shutdown or leaving the run incomplete.
|
|
||
| # Intra-stage ordering | ||
|
|
||
| In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br> |
There was a problem hiding this comment.
I don't know if we need the explanation, but the reason After= does not behave as expected with non-oneshot services is that After= only waits for initialization, not completion. Since long-running services do not have a completion state, once the service initializes successfully, systemd considers it safe to start dependent services.
| In sev-certify, use `Type=oneshot` with `RemainAfterExit=yes` when a suitable `TimeoutStartSec` value can be determined. Otherwise, use `Type=simple`.<br> | ||
|
|
||
| You can't easily use Before/After with simple services since they satisfy Before/After as soon as they start. See intra-stage ordering above. With oneshot services, Before/After isn't satisfied until the main process exits.<br> | ||
|
|
||
| With oneshot services, `TimeoutStartSec` is how long the main process has to exit/finish before systemd kills it. This can affect subprocesses and whether it does depends on `RemainAfterExit` and `KillMode` directives.<br> | ||
|
|
||
| default: simple<br> |
There was a problem hiding this comment.
I have a few comments on this section.
First, I don't think TimeoutStartSec is necessarily the right criteria to decide between Type=oneshot and Type=simple. None of our current services rely on TimeoutStartSec, and I don't think service type selection should be driven by whether we can determine a timeout value.
Instead, I think the recommendation should be based on service behavior. For most of our services, Type=oneshot is the better fit because they are intended to perform finite work and run to completion once. I expect that to apply to the majority of services in this project.
Also, the main benefit of Type=oneshot in our architecture is ordering semantics. After= and Before= only wait for service activation, not completion. For long-running services (Type=simple), dependencies are satisfied as soon as the service initializes successfully. With Type=oneshot, ordering is only satisfied once the main process exits, which aligns better with our stage-based execution model.
RemainAfterExit=yes serves a different purpose. It keeps the service in the active state after execution has completed, allowing the unit to represent that a stage has finished. That is why barrier services use it by definition—they act as completion markers for all work associated with that target.
Without RemainAfterExit=yes, barrier services immediately transition to inactive after completion, which can lead to them being retriggered when revisiting target dependencies later in the flow.
|
|
||
| In sev-certify, use `RemainAfterExit=yes` with oneshot services and `RemainAfterExit=no`, the default, with simple services.<br> | ||
|
|
||
| `RemainAfterExit` has the same semantics for oneshot and simple services. `RemainAfterExit=no` (default) means the service will stop when the main process exits. `RemainAfterExit=yes` means the service will stay active after the main process exits.<br> |
There was a problem hiding this comment.
| `RemainAfterExit` has the same semantics for oneshot and simple services. `RemainAfterExit=no` (default) means the service will stop when the main process exits. `RemainAfterExit=yes` means the service will stay active after the main process exits.<br> | |
| `RemainAfterExit` has the same semantics for oneshot and simple services. `RemainAfterExit=no` (default) means the service will become inactive when the main process exits. `RemainAfterExit=yes` means the service will stay active after the main process exits.<br> |
| ## KillMode | ||
|
|
||
| `KillMode` controls which processes systemd will kill when a unit is stopped.<br> |
There was a problem hiding this comment.
Do you have an example of where we use KillMode?
|
|
||
| ## TimeoutStartSec | ||
|
|
||
| See above. Also, `TimeoutStartSec=infinity` is how to express no timeout.<br> |
There was a problem hiding this comment.
Same thing, I don't know if we use this at all
| In cases where intra-stage ordering is required, worker services use After= to achieve it. This works for oneshot services. For non-oneshot, either<br> | ||
|
|
||
| 1) have a oneshot service use a non-systemd mechanism to tell when the non-oneshot is done and use After= with this oneshot service or | ||
| 2) use OnSuccess (and OnFailure?). |
There was a problem hiding this comment.
I like this suggestion
sev-certify now uses systemd targets and "barrier services" as well as the "worker services" that have always been used. This doc provides guidelines for their use and for the use of some related systemd directives.