Clusteralerts and podalerts don't get recreated in icinga when searchlight-operator restarts

I'm only test this using the following (on v7.0.0):

- NodeAlert: `node-status` and `node-volume` checks.
- ClusterAlert: `component-status`, `event`, and a webhook checks.
- PodAlert: `pod-status` check.

Whenever searchlight-operator pod restarts, NodeAlert hosts and services will automatically be registered back in icinga, but not so for ClusterAlert and PodAlert objects. I had to use `kubectl delete` and `kubectl apply` again to force their registration.

Looking at the logs after restart, I noticed only `plugin.go` and `nodes.go` keep retrying to connect like so:
```
icinga/searchlight-operator-84878b6df-r95z7[operator]: I0719 18:04:58.386974       1 plugin.go:60] Sync/Add/Update for SearchlightPlugin cert
icinga/searchlight-operator-84878b6df-r95z7[operator]: E0719 18:04:58.452500       1 worker.go:76] Failed to process key cert. Reason: command terminated with exit code 1
icinga/searchlight-operator-84878b6df-r95z7[operator]: E0719 18:04:58.742345       1 nodes.go:64] failed to reconcile alert for node k8sworker1c. reason: [: Put https://127.0.0.1:5665/v1/objects/hosts/icinga@node@k8sworker1c: dial tcp 127.0.0.1:5665: getsockopt: connection refused, : Put https://127.0.0.1:5665/v1/objects/hosts/icinga@node@k8sworker1c: dial tcp 127.0.0.1:5665: getsockopt: connection refused, : Put https://127.0.0.1:5665/v1/objects/hosts/icinga@node@k8sworker1c: dial tcp 127.0.0.1:5665: getsockopt: connection refused]
icinga/searchlight-operator-84878b6df-r95z7[operator]: E0719 18:04:58.742381       1 worker.go:76] Failed to process key k8sworker1c. Reason: [: Put https://127.0.0.1:5665/v1/objects/hosts/icinga@node@k8sworker1c: dial tcp 127.0.0.1:5665: getsockopt: connection refused, : Put https://127.0.0.1:5665/v1/objects/hosts/icinga@node@k8sworker1c: dial tcp 127.0.0.1:5665: getsockopt: connection refused, : Put https://127.0.0.1:5665/v1/objects/hosts/icinga@node@k8sworker1c: dial tcp 127.0.0.1:5665: getsockopt: connection refused]
```

So I tried removing `return` from this line:
https://github.com/appscode/searchlight/blob/a90f4bc264099230a02ca329df1c2a30a9e28d51/pkg/operator/pods.go#L28

Also this `if` condition:
https://github.com/appscode/searchlight/blob/a90f4bc264099230a02ca329df1c2a30a9e28d51/pkg/operator/cluster_alerts.go#L21

And now all my clusteralerts and podalerts registered back automatically after restart.

But I'm not sure what's the negative impact of skipping those checks?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clusteralerts and podalerts don't get recreated in icinga when searchlight-operator restarts #400

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clusteralerts and podalerts don't get recreated in icinga when searchlight-operator restarts #400

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions