feat: service restart support for file config plugin#3931
feat: service restart support for file config plugin#3931albinsuresh merged 10 commits intothin-edge:mainfrom
Conversation
Robot Results
|
|
A few missing pieces/open questions:
|
| on_success = "apply" | ||
|
|
||
| [apply] | ||
| background_script = "sudo /usr/share/tedge/config-plugins/file apply ${.payload.type}" |
There was a problem hiding this comment.
Even when we convert this step into a builtin task, this plugin apply command should be executed as a background task, irrespectve of the type of the config and whethere that type restarts tedge-agent or not. That'll smplify this workflow.
There was a problem hiding this comment.
Though the downside of doing the background task is that the "verify" state will be executed before the "apply" command is done.
There was a problem hiding this comment.
Since we have a workdir available (introduced in 55c2d76), how about using that in the apply state to write a "completion marker file" to that directory, which the verify phase can check for afterwards with a timeout? We just need to formalize the "completion marker file" contract.
There was a problem hiding this comment.
Though can't we just tell people not to restart the tedge-agent from within a plugin? And if they want to restart a service then pass back some data via the workflow state exchange mechanism (e.g. :::begin-tedge:::)?
There was a problem hiding this comment.
And if they want to restart a service then pass back some data via the workflow state exchange mechanism (e.g. :::begin-tedge:::)?
Yeah, I had considered that option. But, these are the reasons why I tried to avoid this "restart request" exchange from the plugin to the agent via the command state:
- We'll have to introduce yet another state after
applyfor the the agent to receive the restart request from the plugin, and then process it by restarting the agent. It feels like an optional state in the workflow only there for once specific case (the agent restart). - We'll be exposing the workflow state update contract (
:::begin-tedge:::and:::begin-tedge:::) with the plugin, which was otherwise unaware of the workflow contracts. That's why I was looking for a simple file-system based contract via theworkdir. - Consistency in the sense that they don't need to care if they are restarting the agent or any other service and don't need to do things differently for one vs the other. It would have been different if all service restarts were done by the agent. But since the agent running as
tedgecan't do that, it brings that inconsistency. Though, we can get arround this problem by making the agent usetedgectl.
But, I agree that this workdir approach introduces yet another mechanism for metdata exchange between states, when the MQTT state is already available. I thought this would be less of a problem as the workdir would anyway be used for config backups and even other things like service restart detection using persisted metadata(pids). So, this would be just one additional thing to that list.
We decided to explore a more flexible design, using workflows.
| #[error("A shutdown has been requested")] | ||
| Shutdown, |
There was a problem hiding this comment.
This doesn't look like an error.
There was a problem hiding this comment.
| #[error("A shutdown has been requested")] | |
| Shutdown, | |
| #[error("A restart is required: not running latest version and configuration ")] | |
| RestartRequired, |
reubenmiller
left a comment
There was a problem hiding this comment.
I ran into an issue when trying to trigger a tedge-agent restart after applying a new tedge.toml file, but the operation fails for some reason, this line was found in the operation log at least:
ERROR: builtin action 'builtin:config_update:verify' failed: No builtin operation step handler registered for config_update operation verify step
Below is tedge-configuration-plugin.toml that I configured in my test:
file: /etc/tedge/plugins/tedge-configuration-plugin.toml
files = [
{ path = '/etc/tedge/system.toml', type = 'system.toml', service = 'tedge-agent' },
{ path = '/etc/tedge/tedge.toml', type = 'tedge.toml', service = 'tedge-agent' },
{ path = '/etc/lighttpd/lighttpd.conf', type = 'lighttpd-conf', service = 'lighttpd', service_action = 'restart' }
]And the full operation log of the failed operation:
TST_actualize_shiny_bend_workflow-config_update-c8y-mapper-53005959.log
The issue was caused by using a So, the config manager actor did the first part of the config update until the agent restart, but could not resume it post-restart as the actor itself is not present. That single |
Resolved by 945642f |
| ... tedge.toml | ||
| ... tedge-log-plugin | ||
|
|
||
| Config update restarts service if configured in plugin configuration |
There was a problem hiding this comment.
To be removed, as I can't remember why I added this back then and the Set Configuration Should Restart Service test above covers what was intended anyway.
reubenmiller
left a comment
There was a problem hiding this comment.
The problem I ran into earlier has been resolved, and it's working nicely now. There is a related followup action tracked in #4000, but as of now the PR is working as expected.
didier-wenzek
left a comment
There was a problem hiding this comment.
Approved. Thank you for you perseverance on this long standing work.
I really like the new design for the communications between the workflow engine and an operation actor.
This has also been an opportunity to fix a hack used to request an agent restart.
| // Make sure the operation status is properly reported before the restart | ||
| tokio::time::sleep(Duration::from_secs(5)).await; | ||
| return Err(RuntimeError::ActorError(Box::new(SoftwareManagerError::NotRunningLatestVersion))); | ||
| return Err(RuntimeError::RestartRequired); |
The plugin now accepts the following sub-commands: - `prepare`: No-op - `apply`: Restarts the associated service if configured - `verify`: Checks if the associated service is running - `finalize`: No-op - `rollback`: No-op
945642f to
ab35950
Compare
Proposed changes
system_servicesmodule fromtedgeinto a standalone crate to use from the pluginconfig_updateworkflow with granular operation stepstedge-agentitselfhow to detect a "reload" when service action isto be handled using dedicated plugins as a generic logic in thereloadinstead ifrestartfileplugin is not possibleTypes of changes
Paste Link to the issue
Checklist
just prepare-devonce)just formatas mentioned in CODING_GUIDELINESjust checkas mentioned in CODING_GUIDELINESFurther comments