Skip to content

[Heartbeat] Custom policy reload#49326

Merged
emilioalvap merged 23 commits intoelastic:mainfrom
emilioalvap:hb-custom-policy-reload
Mar 17, 2026
Merged

[Heartbeat] Custom policy reload#49326
emilioalvap merged 23 commits intoelastic:mainfrom
emilioalvap:hb-custom-policy-reload

Conversation

@emilioalvap
Copy link
Copy Markdown
Contributor

@emilioalvap emilioalvap commented Mar 6, 2026

Proposed commit message

Introducing:

  • A config hashing mechanism to be optionally implemented by each of heartbeat plugin's (monitor types) to selective exclude fields from the hash generation.
  • An alternative Update() route, to be implemented optionally by heartbeat plugins to selective update the configuration of a running monitor.

Both of these mechanisms are required to dynamically update running monitors fields, browser params in particular, without triggering a stop/start cycle.

Closes #47511.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

How to test this PR locally

Related issues

@emilioalvap emilioalvap added enhancement Team:obs-ds-hosted-services Label for the Observability Hosted Services team labels Mar 6, 2026
@botelastic botelastic Bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 6, 2026

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 6, 2026

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @emilioalvap? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

Comment thread heartbeat/beater/heartbeat.go Outdated
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

TL;DR

golangci-lint failed in workflow run 22913576497 (all three OS jobs) due to concrete lint violations in PR files. Fix the listed findings (especially heartbeat/reload/list_test.go and x-pack heartbeat files) and re-run lint.

Remediation

  • Fix the reported lint violations:
    • errcheck: unchecked error(s) at heartbeat/reload/list_test.go:273 and x-pack/heartbeat/monitors/browser/sourcejob_test.go:434
    • goimports: x-pack/heartbeat/monitors/browser/config.go:12
    • noctx: use exec.CommandContext in x-pack/heartbeat/monitors/browser/synthexec/synthexec.go:73,84
    • staticcheck: duplicate import + receiver naming in heartbeat/reload/list_test.go:34,35,426
    • testifylint: prefer require.Empty in x-pack/heartbeat/monitors/browser/sourcejob_test.go:75,379,403
  • Validate before pushing:
    • golangci-lint run --new=false --timeout=30m --whole-files (or the repo lint target for the changed heartbeat paths).
Investigation details

Root Cause

The run failed in step golangci-lint across lint (ubuntu-latest), lint (macos-latest), and lint (windows-latest). The failures are source-level linter violations, not infrastructure/test flakiness.

Evidence

  • Workflow: golangci-lint (run 22913576497)
  • Job/step: lint (ubuntu-latest)golangci-lint (same failure class on macOS/windows)
  • Key log excerpt:
    • heartbeat/reload/list_test.go:273:18: Error return value is not checked (errcheck)
    • x-pack/heartbeat/monitors/browser/config.go:12:1: File is not properly formatted (goimports)
    • x-pack/heartbeat/monitors/browser/synthexec/synthexec.go:73:22: os/exec.Command must not be called. use os/exec.CommandContext (noctx)
    • heartbeat/reload/list_test.go:34:2: ST1019: ... imported more than once (staticcheck)

Validation

  • Queried workflow run/job metadata and inspected failed job logs.
  • No local code execution needed; diagnosis is from CI logs.

Follow-up

After applying fixes, push and re-run the same workflow to confirm all three lint matrix jobs pass.


What is this? | From workflow: PR Actions Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a reloadable HBRunnerList with NewHBRunnerList and reload/stop/hash APIs; RunCentralMgmtMonitors now constructs an HB-specific runner list. Introduces HBRunnerFactory with GetHashFunc and makes runner updates possible via UpdatableRunner. Monitor now holds plugin.Plugin, uses plugin.Endpoints, and exposes Update(*conf.C) error. Plugin system gains HashConfigFunc, RegisterWithHashFunc, and Plugin.Update/DoUpdate hooks. Browser monitor registration now uses RegisterWithHashFunc with a hashConfig that ignores params. SourceJob adds mutex-protected Params and Update; synthexec APIs switch params to func() map[string]interface{}. Root config unnesting applied and revision removed.

Suggested labels

Team:Obs-InfraObs

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed PR implements config-hashing and Update() mechanism for Heartbeat plugins to enable smart policy reload without stop/start cycles, directly addressing issue #47511.
Out of Scope Changes check ✅ Passed All changes align with objectives: plugin hashing/update infrastructure, monitor reloading system, browser monitor implementation, and dynamic params handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can scan for known vulnerabilities in your dependencies using OSV Scanner.

OSV Scanner will automatically detect and report security vulnerabilities in your project's dependencies. No additional configuration is required.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@heartbeat/reload/list.go`:
- Around line 124-140: The loop handling updateList currently skips
non-updatable runners and leaves changed same-hash configs ignored; modify the
non-updatable branch in the for hash,cfg := range updateList loop (the code that
checks if runner.(UpdatableRunner)) to behave like the Update error path: when
the existing runner in r.runners[hash] is not an UpdatableRunner, add the runner
to stopList[hash] and add the new cfg to startList[hash] (and keep a log via
r.logger.Errorf/Debugf), so changed configs for same-hash non-updatable runners
fall back to the stop/start cycle.

In `@x-pack/heartbeat/monitors/browser/sourcejob.go`:
- Around line 100-111: The Update method directly writes sj.browserCfg.Params
which is later read by the long‑lived synthexec job (method value captured
elsewhere), causing a race; fix by making Params access thread‑safe: add a
synchronization mechanism (either a sync.Mutex protecting browserCfg.Params or
use an atomic.Value field on SourceJob to hold a snapshot of Params), change
SourceJob.Update to store a deep copy/snapshot into that protected field (e.g.,
set atomic.Value.Store(copyOf(cfg.Params)) or lock mutex, replace params,
unlock), and update the synthexec job reader to load the params from the same
protected accessor (atomic.Value.Load() or mutex-protected getter) so readers
never observe partially updated state.

In `@x-pack/heartbeat/monitors/browser/synthexec/synthexec.go`:
- Around line 155-158: The code calls params() twice which can yield different
results; capture its result once into a local variable (e.g., p := params()) and
use that for the length check and the json.Marshal call, then append stringified
params to cmd.Args (using json.Marshal(p) and handling the error instead of
discarding it) so both checks operate on the same data and errors aren’t
ignored.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d027687-51da-40cd-a771-a2dd7c854ae7

📥 Commits

Reviewing files that changed from the base of the PR and between ecff92b and b4b8f12.

📒 Files selected for processing (13)
  • heartbeat/beater/heartbeat.go
  • heartbeat/monitors/factory.go
  • heartbeat/monitors/monitor.go
  • heartbeat/monitors/plugin/plugin.go
  • heartbeat/reload/list.go
  • heartbeat/reload/list_test.go
  • x-pack/heartbeat/cmd/root.go
  • x-pack/heartbeat/monitors/browser/browser.go
  • x-pack/heartbeat/monitors/browser/config.go
  • x-pack/heartbeat/monitors/browser/config_test.go
  • x-pack/heartbeat/monitors/browser/sourcejob.go
  • x-pack/heartbeat/monitors/browser/sourcejob_test.go
  • x-pack/heartbeat/monitors/browser/synthexec/synthexec.go

Comment thread heartbeat/reload/list.go
Comment thread x-pack/heartbeat/monitors/browser/sourcejob.go
Comment thread x-pack/heartbeat/monitors/browser/synthexec/synthexec.go Outdated
Comment thread x-pack/heartbeat/cmd/root.go
Comment thread x-pack/heartbeat/cmd/root.go
Comment thread heartbeat/reload/list.go
Comment thread heartbeat/reload/list.go Outdated
Comment thread x-pack/heartbeat/monitors/browser/sourcejob.go Outdated
Comment thread heartbeat/reload/list.go Outdated
Comment thread heartbeat/reload/list.go Outdated
@emilioalvap emilioalvap enabled auto-merge (squash) March 17, 2026 17:17
@emilioalvap emilioalvap disabled auto-merge March 17, 2026 17:18
@emilioalvap emilioalvap removed the request for review from vigneshshanmugam March 17, 2026 17:18
@emilioalvap emilioalvap merged commit 9469ea9 into elastic:main Mar 17, 2026
38 of 39 checks passed
@emilioalvap emilioalvap added backport-skip Skip notification from the automated backport with mergify backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches and removed backport-skip Skip notification from the automated backport with mergify labels Mar 17, 2026
@github-actions
Copy link
Copy Markdown
Contributor

@Mergifyio backport 9.2 9.3

@emilioalvap emilioalvap added backport-skip Skip notification from the automated backport with mergify and removed backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches labels Mar 17, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 17, 2026

backport 9.2 9.3

✅ Backports have been created

Details

mergify Bot pushed a commit that referenced this pull request Mar 17, 2026
* Add heartbea reload and config hash logic

Introducing:

A config hashing mechanism to be optionally implemented by each of heartbeat plugin's (monitor types) to selective exclude fields from the hash generation.
An alternative Update() route, to be implemented optionally by heartbeat plugins to selective update the configuration of a running monitor.
Both of these mechanisms are required to dynamically update running monitors fields, browser params in particular, without triggering a stop/start cycle.

(cherry picked from commit 9469ea9)
mergify Bot pushed a commit that referenced this pull request Mar 17, 2026
* Add heartbea reload and config hash logic

Introducing:

A config hashing mechanism to be optionally implemented by each of heartbeat plugin's (monitor types) to selective exclude fields from the hash generation.
An alternative Update() route, to be implemented optionally by heartbeat plugins to selective update the configuration of a running monitor.
Both of these mechanisms are required to dynamically update running monitors fields, browser params in particular, without triggering a stop/start cycle.

(cherry picked from commit 9469ea9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-skip Skip notification from the automated backport with mergify enhancement Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Heartbeat] Implement smarter managed policy reload

3 participants