thar-be-settings: improve service restart behavior#802
thar-be-settings: improve service restart behavior#802koooosh wants to merge 1 commit intobottlerocket-os:developfrom
Conversation
Signed-off-by: Kush Upadhyay <kushupad@amazon.com>
f0e2d8a to
2283cba
Compare
|
^ Force push fixes a minor fmt warning from builds |
| fn restart_non_systemd(&self) -> Result<()> { | ||
| for restart_command in &self.model.restart_commands { | ||
| // Skip systemd commands - handled by Services::restart_systemd() | ||
| if restart_command.contains("systemctl") { |
There was a problem hiding this comment.
Do we want contains or starts_with?
| /// Execute non-systemd restart commands for this service. | ||
| fn restart_non_systemd(&self) -> Result<()> { | ||
| for restart_command in &self.model.restart_commands { | ||
| // Skip systemd commands - handled by Services::restart_systemd() |
There was a problem hiding this comment.
It's kind of a shame we have to sort this twice
| .0 | ||
| .values() | ||
| .flat_map(|s| s.model.restart_commands.iter()) | ||
| .filter(|cmd| cmd.contains("systemctl")) |
There was a problem hiding this comment.
Same as below, why not starts_with if we're assuming the first word is the command?
| for (name, service) in &services.0 { | ||
| if let Err(e) = service.restart_non_systemd() { | ||
| error!("Failed to restart {}: {}", name, e); |
There was a problem hiding this comment.
Seems cleaner to partition Services into two sets, one for systemd and one for non-systemd. These could be represented by different types and each could have a restart method.
| debug!("Restarting systemd units: {:?}", &units); | ||
| let mut cmd = Command::new("/bin/systemctl"); | ||
| cmd.arg("try-reload-or-restart"); | ||
| cmd.args(&units); |
There was a problem hiding this comment.
We shouldn't automatically treat all systemctl ... commands as wanting systemctl try-reload-or-restart ... instead.
We should assume the restart command must be issued exactly as specified. The only optimization permitted is to group the restart commands by subcommand - all systemctl restart ... together, all systemctl try-restart ... together, and so on.
It might also be prudent to lean into the non-determinism for restarts - for example, preparing the entire batch of restart commands, then randomizing the order. There may be some brittleness from potential ordering surprises that needs to be shaken out.
Issue number:
Related to: bottlerocket-os/bottlerocket#4556
Description of changes:
Improves
thar-be-settingsservice restart behavior with:Batch systemd restarts: All systemd units are now restarted with a single
systemctl try-reload-or-restartcall, making the restart atomic from systemd's perspective.No early bail: All service restarts are attempted regardless of individual failures so the system is not left in a partially-configured state. Failures are logged as they occur, and a summary error is returned at the end.
There are various ways to implement this, but I focused on keeping
restart_servicesas simple as possible. This meant making symmetrical "flows" for systemd and non-systemd services so the restart process is simply:Since systemd services are batched,
restart_systemdmethod needs to be part of theServices impl. I removed theServiceRestarttrait since it was not used for abstraction (only one implementor) and moved the individual service restart logic torestart_non_systemdunderService impl.Testing done:
Launched an
aws-devami withthar-be-settingsdebug logs enabled and a forced failure incorndogto demonstrate the no-bail behavior. Instance failed boot as expected since settings failed to apply, but journal logs show:Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.