Skip to content

feat(backend): init horizontal clustering#1201

Open
lusergit wants to merge 5 commits intoedgehog-device-manager:mainfrom
lusergit:push-nrvnswvyxzqv
Open

feat(backend): init horizontal clustering#1201
lusergit wants to merge 5 commits intoedgehog-device-manager:mainfrom
lusergit:push-nrvnswvyxzqv

Conversation

@lusergit
Copy link
Copy Markdown
Collaborator

@lusergit lusergit commented Jan 28, 2026

Horizontally clustering edgheog

Based on #1199

These changes aim at automatically scale edgheog when multiple nodes are able to see each other.

libcluster

libcluster is being added to allow different nodes to see each other. Depending on the deployment environment, different strategies can be selected to query services and discover other edgheog instances in the same cluster.

horde

Horde has been added to allow registries with different processes to share the load across multiple replicas. This would allow a better management of active processes and automatically handles netsplits.

Checklist

  • I have read the CONTRIBUTING.md
  • I have added tests that prove my fix is effective or that my feature works
  • I have added or updated documentation (if appropriate)

Further Comments

Communication between different services of edgehog happen trough the pubsub module (Edgehog.PubSub), which internally uses the Phoenix.PubSub. This mechanism already shares messages between different replicas, hence messages are free to pass between one replica and the other
(e.g., a campaign is started on one node, hence its process is active on one node, but a deployment is updated in another node. In this case the pubsub mechanism correctly messages all listening services, even the one on different nodes, and the services should be able to work as usual).

Optional: testing

Testing this new feature can be done by adding

  edgehog-backend:
    image: edgehogdevicemanager/edgehog-backend:0.10.0
    build:
      context: backend
    # This section here
    deploy:
      replicas: 6 
    # ...

In the docker-compose.yaml file

@lusergit lusergit force-pushed the push-nrvnswvyxzqv branch 3 times, most recently from 6b06d59 to 13cf0b7 Compare January 29, 2026 08:52
@coveralls
Copy link
Copy Markdown

coveralls commented Jan 29, 2026

Pull Request Test Coverage Report for Build 0f8a90b75d89dc74710f86829858fc0614fd290a-PR-1201

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 24 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.3%) to 81.35%

Files with Coverage Reduction New Missed Lines %
lib/edgehog/application.ex 1 88.89%
lib/edgehog/devices/reconciler/reconciler.ex 7 31.82%
lib/edgehog/config.ex 16 41.18%
Totals Coverage Status
Change from base Build 898ad56e5c2bac7413e1fdf031b5c9c6243cad6a: -0.3%
Covered Lines: 2735
Relevant Lines: 3362

💛 - Coveralls

@lusergit lusergit force-pushed the push-nrvnswvyxzqv branch 12 times, most recently from 7c1c7c6 to bc88347 Compare February 3, 2026 13:38
@lusergit lusergit force-pushed the push-nrvnswvyxzqv branch 6 times, most recently from ab5cb83 to 5d89de1 Compare February 12, 2026 10:09
@lusergit lusergit force-pushed the push-nrvnswvyxzqv branch 6 times, most recently from 9a51c35 to ab965df Compare February 19, 2026 16:28
@lusergit lusergit force-pushed the push-nrvnswvyxzqv branch 2 times, most recently from 0f57caf to 4948a33 Compare February 23, 2026 11:56
@lusergit lusergit force-pushed the push-nrvnswvyxzqv branch 8 times, most recently from 4f0c539 to a508e40 Compare March 24, 2026 16:10
@lusergit lusergit force-pushed the push-nrvnswvyxzqv branch 7 times, most recently from 59f2df0 to 0b23b4e Compare March 31, 2026 07:37
@lusergit lusergit force-pushed the push-nrvnswvyxzqv branch 10 times, most recently from 5fdbe34 to c649401 Compare April 8, 2026 10:30
Signed-off-by: Luca Zaninotto <luca.zaninotto@secomind.com>
Adds libcluster configs to scale edgehog horizontally.

This allows edgehog to manage the workload among different replicas for
- campaign execution
- notifications
- reconciliation tasks

New environment vairables have been introduced, to allow different edgehog nodes
to see eachother based on the deployment strategy.

- `EDGEHOG_CLUSTERING_STRATEGY`: one of `none`, `docker-compose` or
  `kubernetes`. This chooses the strategy edgehog will use to lookup other nodes
  in the cluster.

- `EDGEHOG_CLUSTERING_KUBERNETES_SELECTOR`: The endpoint label to get other
  edgehog instances. This defaults to `app=edgehog`.

- `EDGEHOG_CLUSTERING_KUBERNETES_NAMESPACE` the kubernetes namespace to find
  other edgehog instances. This defaults to `edgehog`.

Signed-off-by: Luca Zaninotto <luca.zaninotto@secomind.com>
When deploying on kubernetes, it is possible to deploy multiple replicas of the
backend service. To do so, a couple fo environment variables have been added to
instruct edgehog on how to find and connect to other nodes.

Signed-off-by: Luca Zaninotto <luca.zaninotto@secomind.com>
Signed-off-by: Luca Zaninotto <luca.zaninotto@secomind.com>
Moves relevant registries and supervisors in the application tree to allow
edgehog to scale horizontally. This is done only with registries and supervsors
where it makes sens to chare the load:

- `Containers.Reconciler`, where the process spawns per-tenant and talks with
  the DB.
- `Tenant.Reconciler`, where tasks and processes are again spawned per-tenant.
- `Devices.Reconciler` is moved to a single process managed trough a horde
  process to make it single per-cluster.

Signed-off-by: Luca Zaninotto <luca.zaninotto@secomind.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants