Skip to content

ns-ha: possible race condition if switches happen close to one another  #1547

@Tbaile

Description

@Tbaile

Steps to reproduce

Switch HA role so that the primary firewall (master) resumes operation after a failover. Observe the state of dedalo and dpi services using /etc/init.d/dedalo status and /etc/init.d/dpi status.

Expected behavior

After takeover by the primary/master node, dedalo and dpi services should be enabled and remain active as intended, without requiring manual intervention.

Actual behavior

Both dedalo and dpi services remain inactive. Manual restart temporarily resolves the issue. Logs from ns-ha show that at failover, restart and stop events overlap: keepalived executes backup_and_stop, then master_and_restart, with commands running too close together and causing the services to be stopped instead of restarted. This is confirmed by observing ns-ha log lines showing restart immediately followed by stop for the same service.

The issue only occurs when the primary takes back over as master. On the slave node, after a switch, services remain up as expected.

Workaround: Manually restart the affected services or schedule a cron job to bring them up if they go down.

Components

ns-ha - 0.0.3-r1

See also

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    In Progress 🛠

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions