ns-ha: possible race condition if switches happen close to one another 

**Steps to reproduce**

Switch HA role so that the primary firewall (master) resumes operation after a failover. Observe the state of dedalo and dpi services using /etc/init.d/dedalo status and /etc/init.d/dpi status.

**Expected behavior**

After takeover by the primary/master node, dedalo and dpi services should be enabled and remain active as intended, without requiring manual intervention.

**Actual behavior**

Both dedalo and dpi services remain inactive. Manual restart temporarily resolves the issue. Logs from ns-ha show that at failover, restart and stop events overlap: keepalived executes backup_and_stop, then master_and_restart, with commands running too close together and causing the services to be stopped instead of restarted. This is confirmed by observing ns-ha log lines showing restart immediately followed by stop for the same service.

The issue only occurs when the primary takes back over as master. On the slave node, after a switch, services remain up as expected.

Workaround: Manually restart the affected services or schedule a cron job to bring them up if they go down.

**Components**

ns-ha - 0.0.3-r1

**See also**

- Helpdesk ticket: https://helpdesk.nethesis.it/a/tickets/199001
- Internal discussion: https://mattermost.nethesis.it/nethesis/pl/gdi9nek5jpfybgw9h7gjbxyxwh


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ns-ha: possible race condition if switches happen close to one another #1547

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ns-ha: possible race condition if switches happen close to one another #1547

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions