Describe the bug
OpenSearch will execute 99% of monitors on only 2 nodes, the nodes that contain the primary and replica shards for the .opendistro-alerting-config index
On 2.18 this behaviour doesn't seem to cause too many issues, on 3.0+ it leads to regular crashes of these 2 nodes with the shards, which often cascades into larger cluster issues
Related component
Search:Performance
To Reproduce
- Deploy OpenSearch cluster with significant number of nodes (16+)
- Run a high number of regularly scheduled monitors (100+, ideally 1k+) with ~10 min frequency
- Parse OpenSearch logs and observe which nodes are running these monitors (can query for
Executing scheduled monitor)
- Observe that over 100k+ monitor runs, 99% of them will only execute on 2 nodes in the cluster, the 2 nodes that have the
.opendistro-alerting-config primary and replica shards
This activity has been observed on 2.18, 3.1, 3.3
Expected behavior
Monitors execute on all nodes in the cluster OR it is easy to increase the shard count on .opendistro-alerting-config
Additional Details
Plugins
Standard RPM install of OpenSearch exhibits this issue
Screenshots
Cannot provide
Host/Environment (please complete the following information):
- OS: RHEL
- Version: 2.18, 3.1, 3.3, untested on others
Additional context
N/A
Describe the bug
OpenSearch will execute 99% of monitors on only 2 nodes, the nodes that contain the primary and replica shards for the
.opendistro-alerting-configindexOn 2.18 this behaviour doesn't seem to cause too many issues, on 3.0+ it leads to regular crashes of these 2 nodes with the shards, which often cascades into larger cluster issues
Related component
Search:Performance
To Reproduce
Executing scheduled monitor).opendistro-alerting-configprimary and replica shardsThis activity has been observed on 2.18, 3.1, 3.3
Expected behavior
Monitors execute on all nodes in the cluster OR it is easy to increase the shard count on
.opendistro-alerting-configAdditional Details
Plugins
Standard RPM install of OpenSearch exhibits this issue
Screenshots
Cannot provide
Host/Environment (please complete the following information):
Additional context
N/A