Add PerfdatawriterConnection to handle network requests for Perfdata Writers by jschmidt-icinga · Pull Request #10668 · Icinga/icinga2

jschmidt-icinga · 2025-12-10T10:00:35Z

Description

This unifies the connection handling for all perfdata writers into a single class PerfdataWriterConnection that provides a blocking interface (using promises) to the underlying asynchronous operations.

All in all this is a huge code reduction and deduplication (as long as you don't count the added unit-tests) and should fix the issues with the work-queues being stuck on shutdown.

Fixes #10159, possibly fixes #10629

Connection handling

Connections are now established lazily whenever a message is being sent to the server. Some writers already worked that way, while others connected at the start and kept their connections around for as long as they needed them or until the server disconected.
HTTP-based writers will disconnect after sending a message and receiving the response unless the keep-alive flag is set by the server. Currently we do not request keep-alive on our side, but that could easily be done on the side of the writers if we want to.
The writers can use a (slightly awkwardly named) CancelWithTimeout() function that waits on a future for a specified timeout. When either that timeout expires or the future is ready, the connection is stopped by canceling outstanding operations and disconnecting the stream/socket.
All system errors are handled by the connection class internally and lead to a retry after an exponentially increasing timeout similar to the backoff strategy implemented by Add OTLPMetricsWriter #10685. The writers obviously still need to handle the HTTP status codes from the response, which the connection class doesn't touch in any way.

Rationale

A simpler solution to the disconnect problem would have been possible. Because a cancelled send or handshake don't allow for a graceful shutdown of the TLS connection anyway, especially when the server is unresponsive, a simple close on the stream's socket would be enough to cancel all outstanding operations. However, many writers only keep temporary stream objects in the functions where the messages are sent and currently don't track the state of the connection, so this would also need some serious refactoring but different for each writer.

Instead of doing the same thing over and over for each writer, I chose to reduce code duplication and abstract the connection handling out of the individual writers and only fix it in one place. Using async operations and an asio strand was convenient, because now every yield leaves the connection object in a defined state, without needing any atomic variables or mutexes, which makes the disconnect handling much simpler.

Other changes

In addition to the changes to connection handling some other minor refactoring has been done:

OpenTsdbWriter now also uses a work queue like all the other writers. Previously this writer would send its data directly in its CheckResultHandler which meant that if a server was slow or unresponsive it could have blocked check-result processing and slowed down the whole process/cluster.
ElasticsearchWriter locked a mutex on each Flush() so it could be called from both outside and inside the work queue. This was changed to always queuing the Flush() onto the work queue instead. This makes the behavior more similar to what InfluxDbCommonWriter does.
Both ElasticsearchWriter and InfluxDbCommonWriter's flush timer has been improved by setting an atomic boolean the first time the flush is queued and then skipping further queue entries until the flush has been processed.

The two/three HTTP based writers could benefit from further refactoring to use the new HTTP message classes, but I didn't do this here, because it isn't necessary to solve the problem at hand and ElasticsearchWriter is going to be deprecated anyway (see #10734). A refactor for the InfluxDB writers can be done at the point where we have a good reason to do it.

Testing

Aside from the Unit-Tests we've manually tested with all the current backends:

@yhabteab tested Graphite and InfluxDB here.
I've tested Elasticsearch, OpenTSDB and Graylog (GelfWriter).

Status

Done and ready to be merged.

lib/perfdata/perfdatawriterconnection.hpp

jschmidt-icinga · 2026-03-04T07:16:52Z

Unfortunately, despite everything having been ready and tested, I noticed a problem with the previous iteration where a Send() operation would be retried infinitely without any wait in between if the connection was established successfully but the send failed immediately, which could for example happen when the client was configured to use bare TCP and the server expected a TLS connection. This made another medium size update necessary.

Moved the wait between retry attempts from the connection step to the send step, so now when send fails it will also wait an increasing amount of time before retrying, up to 32s. This doesn't magically solve the misconfiguration error, but it prevents the log being spammed with errors and possible increase in cpu load that this would cause.
Added a test-case written by @yhabteab to ensure the http messages are not mutated during repeated Send()s. This shouldn't normally happen especially since I also made Send() take them by const-reference now, but older boost versions don't have const overloads for their async_write() functions, so it doesn't hurt to make sure this is true across all supported platforms.

I've not retested every writer with these changes, just a quick test with OpenTSDB, because no writer code was changed since the previous iteration, only the connection code, which should be covered by the unit-tests.

Edit: And it immediately fails on amazonlinux:2. Shocker...

lib/perfdata/perfdatawriterconnection.hpp

yhabteab

Looks good to me now! Unless @julianbrost wants to have a look at it, feel free to merge it once the GHAs pass.

julianbrost

I selected "request changes" mostly that this isn't accidentally merged given the already existing approval until the discussion on OpenSSL context initialization errors is resolved.

lib/perfdata/perfdatawriterconnection.hpp

lib/perfdata/perfdatawriterconnection.cpp

lib/perfdata/gelfwriter.cpp

lib/perfdata/perfdatawriterconnection.hpp

lib/perfdata/gelfwriter.cpp

The things I requested were addressed, so that review is obsolete. I probably won't do a full review of this PR myself though.

There's a set of two tests for each perfdatawriter, just to make sure they can connect and send data that looks reasonably correct, and to make sure pausing actually works while the connection is stuck. Then there's a more in-depth suite of tests for PerfdataWriterConnection itself, to verify that connection handling works well in all types of scenarios. Co-authored-by: Yonas Habteab <yonas.habteab@icinga.com>

cla-bot bot added the cla/signed label Dec 10, 2025

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch 9 times, most recently from cb64ef1 to 21c2575 Compare December 12, 2025 07:45

jschmidt-icinga mentioned this pull request Dec 15, 2025

Create a new ElasticsearchDatastreamWriter to more efficiently store data in elasticsearch. #10577

Merged

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch from 21c2575 to 29f91c9 Compare December 15, 2025 11:08

jschmidt-icinga mentioned this pull request Dec 15, 2025

(WIP) Add ElasticsearchDatastreamWriter (Follow-up to #10577) #10677

Closed

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch 2 times, most recently from d018cae to a66f9ed Compare December 17, 2025 14:15

This was referenced Jan 22, 2026

Add OTLPMetricsWriter #10685

Open

Refactor HttpMessage into generalized templated types #10692

Merged

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch 4 times, most recently from 6391859 to 8ddd29d Compare January 28, 2026 11:35

jschmidt-icinga added this to the 2.16.0 milestone Jan 29, 2026

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch 8 times, most recently from 40c5481 to 4acf0b3 Compare February 4, 2026 07:50

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch 4 times, most recently from 12ff9d1 to bf462e5 Compare March 2, 2026 13:10

julianbrost reviewed Mar 2, 2026

View reviewed changes

lib/perfdata/perfdatawriterconnection.hpp Outdated Show resolved Hide resolved

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch from bf462e5 to 9afa40a Compare March 4, 2026 07:06

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch from 9afa40a to 17c1f2e Compare March 4, 2026 07:31

yhabteab reviewed Mar 4, 2026

View reviewed changes

lib/perfdata/perfdatawriterconnection.hpp Outdated Show resolved Hide resolved

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch from 17c1f2e to da2a396 Compare March 4, 2026 09:18

yhabteab previously approved these changes Mar 4, 2026

View reviewed changes

yhabteab added area/graphite Metrics to Graphite area/opentsdb Metrics to OpenTSDB area/influxdb Metrics to InfluxDB labels Mar 4, 2026

julianbrost previously requested changes Mar 5, 2026

View reviewed changes

lib/perfdata/perfdatawriterconnection.hpp Outdated Show resolved Hide resolved

lib/perfdata/perfdatawriterconnection.cpp Outdated Show resolved Hide resolved

lib/perfdata/gelfwriter.cpp Show resolved Hide resolved

lib/perfdata/perfdatawriterconnection.hpp Outdated Show resolved Hide resolved

Add PerfdataWriterConnection class

2e2576c

jschmidt-icinga dismissed yhabteab’s stale review via f872cd0 March 6, 2026 10:06

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch from da2a396 to f872cd0 Compare March 6, 2026 10:06

julianbrost reviewed Mar 6, 2026

View reviewed changes

lib/perfdata/gelfwriter.cpp Outdated Show resolved Hide resolved

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch from f872cd0 to b6aecaf Compare March 9, 2026 07:43

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch from b6aecaf to a1f61aa Compare March 17, 2026 10:53

julianbrost reviewed Mar 17, 2026

View reviewed changes

lib/perfdata/gelfwriter.cpp Outdated Show resolved Hide resolved

jschmidt-icinga and others added 3 commits March 17, 2026 12:11

Use PerfdataWriterConnection in perfdata writers

da2fc9d

Add documentation for the new disconnect_timeout config options

2ac0b8a

jschmidt-icinga force-pushed the perfdata-writers-connection-handling branch from a1f61aa to 75b2ec6 Compare March 17, 2026 11:11

yhabteab approved these changes Mar 17, 2026

View reviewed changes

julianbrost enabled auto-merge March 17, 2026 12:34

julianbrost merged commit 6592eae into master Mar 17, 2026
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PerfdatawriterConnection to handle network requests for Perfdata Writers#10668

Add PerfdatawriterConnection to handle network requests for Perfdata Writers#10668
julianbrost merged 9 commits intomasterfrom
perfdata-writers-connection-handling

jschmidt-icinga commented Dec 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

jschmidt-icinga commented Mar 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

yhabteab left a comment

Uh oh!

julianbrost left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jschmidt-icinga commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Connection handling

Rationale

Other changes

Testing

Status

Uh oh!

Uh oh!

jschmidt-icinga commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yhabteab left a comment

Choose a reason for hiding this comment

Uh oh!

julianbrost left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jschmidt-icinga commented Dec 10, 2025 •

edited

Loading

jschmidt-icinga commented Mar 4, 2026 •

edited

Loading