fix(op): ensure op worker restarts on reboot and make script resilient to reinstalls#428
Draft
gwenaskell wants to merge 2 commits intomainfrom
Draft
fix(op): ensure op worker restarts on reboot and make script resilient to reinstalls#428gwenaskell wants to merge 2 commits intomainfrom
gwenaskell wants to merge 2 commits intomainfrom
Conversation
Contributor
Author
|
I will hold this off for the now because some of those operations could actually be bundled directly with the opw package. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related improvements to
install_script_op_worker2.sh:Refs: OPA-5043
uninstall left state behind.
Refs: OPA-3197
1. Enable the OP Worker on host reboot
Why
After a host reboot, an OPW installed via this script wasn't reliably coming
back up: the script only ran the package's runtime
restartcommand, whichdoesn't register the service for boot-time startup. Whether the service then
started on boot was at the mercy of the package's post-install hooks and
varied across distros.
What
A new
enable_cmdis computed alongside the existingrestart_cmd/start_instructions/stop_instructions, picking the right tool per initsystem:
systemctl enable observability-pipelines-worker.serviceupdate-rc.d observability-pipelines-worker defaultschkconfig observability-pipelines-worker on.confis picked up automatically)serviceis intentionally not used for this — it only forwards runtimerestart_cmd,no_startflag so:DD_INSTALL_ONLY=trueskips both starting and enabling.DD_API_KEY/DD_OP_PIPELINE_IDstill skips both.install proceeds.
2. Make the install script resilient to a previous (incomplete) uninstall
Why
apt-get remove/apt-get purgeleave several pieces of state behind thatthe package manager doesn't own:
/etc/default/observability-pipelines-worker(created by this script)/etc/observability-pipelines-worker/install_info(created by this script)/etc/apt/sources.list.d/datadog-observability-pipelines-worker.list/usr/share/keyrings/datadog-archive-keyring.gpg/var/lib/observability-pipelines-worker/observability-pipelines-workersystem userMost of those steps in the script are already idempotent (repo file overwrite,
GPG re-import, package re-install,
install_infooverwrite). Two were not:in the new invocation. Re-running the script with a new
DD_API_KEYwas ano-op.
chown $bootstrap_filecould fail underset -eif a partial prior statemeant the file or the system user was missing, aborting the whole install.
What
Env file behavior (
/etc/default/observability-pipelines-worker):DD_API_KEYsuppliedDD_OP_*not previously setupsert_env_varhelper that doessed -i "/^${key}=/d"thenecho $key=$value >> $env_file./etc/observability-pipelines-worker/bootstrap.yaml):chownsucceedschownsucceedsobservability-pipelines-workeruser missingchownfails → script abortschownfails → script abortsBackward compatibility note
The env file change is a semantic change: previously the file was
inviolate on re-runs. Now any DD_* values passed in the new invocation will
overwrite their matching lines. I believe this matches operator expectations
("I re-ran with a new key, why didn't it apply?") and the current behavior
was an undocumented footgun, but it's worth flagging.
Operator-added keys (anything the script doesn't pass via DD_*) are still
preserved.
Test plan
systemctl is-enabled observability-pipelines-workerreturnsenabled.DD_API_KEY— verify/etc/default/observability-pipelines-workerreflects the new key and the worker uses it.apt-get remove observability-pipelines-workerthen re-run the script — install completes without aborting; service is enabled and starts.apt-get purge observability-pipelines-worker && userdel observability-pipelines-workerthen re-run — install completes (with warnings), package's postinst recreates the user, service starts.DD_INSTALL_ONLY=trueinstall — script does not enable nor start the service.update-rc.d defaultsruns, service comes up at boot.Out of scope
file, keyring, system user) belongs in the OPW package's
postrm/prerm(analogous todatadog-agent'sagent-deb/postrm),not in this install script.