Skip to content

IcingaDB: add changes queue & RedisConnection enhancements#10619

Open
yhabteab wants to merge 18 commits intomasterfrom
efficient-config-and-state-update-queue
Open

IcingaDB: add changes queue & RedisConnection enhancements#10619
yhabteab wants to merge 18 commits intomasterfrom
efficient-config-and-state-update-queue

Conversation

@yhabteab
Copy link
Member

@yhabteab yhabteab commented Oct 31, 2025

This pull request introduces a new runtime changes queue to IcingaDB, along with several enhancements to the RedisConnection class. These changes aim to improve the memory footprint and number of duplicate (and thus superfluous) Redis queries. The problem of duplicate queries has been a long-standing issue in IcingaDB, and some hacky workarounds have been implemented in the past to mitigate it. This PR takes a more systematic approach as Julian described in #10186 to address the root cause. I will try to summarize the key changes below:

Changes Queue

A new changes queue has been introduced to IcingaDB, which allows for batching of all runtime updates for a given object in an efficient manner. The changes queue works as outlined below:

Before going into more detail, we should clarify what we mean by "changes". In this context, changes refer to any event
that requires a Redis write operation. This includes, but is not limited to:

  • Object creation and deletion
  • Object modification (e.g., updating attributes)
  • Relationship updates (e.g., adding or removing links between objects)
  • State related updates (e.g., state changes, acknowledgements, etc.)
  • Dependency config and state updates

This new queue does not cover any history related writes, those types of events follow a different path and are not
affected by this change. The focus here is solely on runtime object changes that affect the normal non-historical operation of IcingaDB. Consequently, history and heartbeat related writes use their own dedicated Redis connection and do not interfere with any of the changes described here.

Now, here is how the changes queue operates:

When an object is modified, instead of immediately writing the changes to Redis, the object pointer is pushed onto the queue with a corresponding flag indicating the type of change required. As long as the object remains in the queue, any subsequent Redis write requests concerning that object are merged into the existing queued dirty bits. This means that no matter how many times e.g., a OnStateChange is triggered for a given object, only a single write operation will be performed when it is finally popped from the queue. Do note that an object can have multiple dirty bits set, so if both its attributes and state are modified while in the queue, a state and config update will be sent when it is processed.

The consumer of the changes queue is a new background worker that pops objects from the queue and performs the necessary Redis write operations. This worker doesn't immediately process objects as they are enqueued; instead, it waits for a short period (currently set to 1000ms) to allow for more changes to accumulate and be merged. After this wait period, the worker serializes the queued objects according to their dirty bits and sends the appropriate Redis commands. Though, there's also another restriction in place: when the used RedisConnection reaches a certain number of pending commands (currently set to 512), the worker won't dequeue any more objects from the changes queue until the pending commands drop below that threshold. This ensures that we don't unnecessarily waste memory by serializing too many objects in advance, if the Redis server isn't able to keep up.

To accommodate this new changes queue, quite a number of existing code has been refactored, so that we no longer perform immediate writes to Redis. Additionally, the RedisConnection class has been enhanced to support this new workflow.

RedisConnection Enhancements

Several enhancements have been made to the RedisConnection class to better support the changes queue and improve overall efficiency:

  1. As a consequence of the changes queue, the Redis queries internal queue has significantly been simplified. As opposed to the previous version which used a complex data structure to correctly manage the query priorities, this version uses a std::deque for the write queue and a simple mechanism to insert high-priority items at the front. By default, items are processed in FIFO order, but if someone wants to immediately send a high-priority query it will be placed at the front of the queue (remember std::deque allows efficient insertion at both ends), and will overtake any normal priority items already queued. However, if there are already high-priority items in the queue, the new high-priority item will be inserted after them but still before any normal priority items, ensuring that all high-priority items are processed in the order they were enqueued.

  2. And while I'm at it, I also took the chance to improve the WriteQueueItem type by replacing the previously used ridiculously verbose query types by a more compact std::variant based approach. This not only reduces memory usage but also makes clearer that each item represents exactly one of a defined set of query types and nothing else.

Now, IcingaDB is subscribed to the OnNextCheckChanged signal and not the dummy OnNextCheckUpdated signal anymore. Though, that dummy signal is still there since the IDO relies on it. The only behavioural change in IcingaDB as opposed to before is that the oldest pending Redis query is determined only on the primary Redis connection (the one used for history and heartbeats). If you guys think this is a problem, I can look into a way to have IcingaDB consider all connections when determining the oldest pending query.

resolves #10186

@cla-bot cla-bot bot added the cla/signed label Oct 31, 2025
@yhabteab yhabteab marked this pull request as draft October 31, 2025 12:17
@yhabteab yhabteab added the area/icingadb New backend label Oct 31, 2025
@yhabteab yhabteab force-pushed the efficient-config-and-state-update-queue branch 3 times, most recently from 200080d to 2099e59 Compare October 31, 2025 14:47
@yhabteab yhabteab added this to the 2.16.0 milestone Oct 31, 2025
@yhabteab yhabteab force-pushed the efficient-config-and-state-update-queue branch from 2099e59 to b15a5fb Compare November 3, 2025 08:17
@yhabteab yhabteab marked this pull request as ready for review November 3, 2025 11:20
Copy link
Contributor

@jschmidt-icinga jschmidt-icinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test extensively yet, but from a quick test everything seems to work as expected. I also can't speak much for the logic and performance implications of when to send which events to Redis since I barely touched that until now. I'll continue to look at this in the coming days and see if I can test this more thoroughly.

String m_CipherList;
double m_ConnectTimeout;
DebugInfo m_DebugInfo;
ObjectImpl<IcingaDB>::ConstPtr m_IcingaDB; // The IcingaDB object this connection belongs to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces some tight bidirectional coupling between the IcingaDB (even though it's just the -ti file here) and RedisConnection objects. It would be nice if this could be avoided, though copying all the members doesn't seem elegant either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think, that this is 100% better than the previous version of this, and I don't see a problem with the bidirectional coupling either, since the RedisConnection class is meant to be used by the IcingaDB class only. Obviously, this is not that perfect either but if you have a better approach in mind, then feel free to suggest it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had already thought about it and couldn't come up with anything good to suggest, other than designing it that way from the start. One option would be to maybe make the members string_views, which would be fine here, since the connection object has a longer lifetime than the icingadb object, but might become an implementation detail when/if other classes want to use this class. Or just suck up the (small) memory cost and leave things as they were.

It's not the worst thing in the world either way. We have this kind of coupling in many places, like (JsonRpc|HttpServer)Connection<->ApiListener, but it is kind a ugly, design-wise.

Regarding RedisConnection being only used by IcingaDB: Recently I was briefly looking at caching Perfdata in Redis for persistence in case the target services go offline. It's always nice to at least keep the option open of using a class like that for something else in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just suck up the (small) memory cost and leave things as they were.

It's not just about the memory cost but due to this ridiculously long parameter passing I would have to use. With master branch there are two places like this, and this PR add another one. So, decided simply squash them with approach.

m_Rcon = new RedisConnection(GetHost(), GetPort(), GetPath(), GetUsername(), GetPassword(), GetDbIndex(),
GetEnableTls(), GetInsecureNoverify(), GetCertPath(), GetKeyPath(), GetCaPath(), GetCrlPath(),
GetTlsProtocolmin(), GetCipherList(), GetConnectTimeout(), GetDebugInfo());

RedisConnection::Ptr con = new RedisConnection(GetHost(), GetPort(), GetPath(), GetUsername(), GetPassword(), GetDbIndex(),
GetEnableTls(), GetInsecureNoverify(), GetCertPath(), GetKeyPath(), GetCaPath(), GetCrlPath(),
GetTlsProtocolmin(), GetCipherList(), GetConnectTimeout(), GetDebugInfo(), m_Rcon);

Recently I was briefly looking at caching Perfdata in Redis for persistence in case the target services go offline. It's always nice to at least keep the option open of using a class like that for something else in the future.

If we ever end up using RedisConnection for other purposes, then we would definitely have to move it somewhere else. And while doing that, there will be other design decisions to make, I would consider this tiny bit of coupling acceptable for now that can further be improved later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please the updated code now. I've introduced a helper struct for all the parameters instead and will be copied only once.

@yhabteab yhabteab force-pushed the efficient-config-and-state-update-queue branch from b15a5fb to c8274a0 Compare November 13, 2025 15:04
@yhabteab
Copy link
Member Author

I'll continue to look at this in the coming days and see if I can test this more thoroughly.

At the moment the integration tests from the Icinga DB repository are the only way to stress test this PR thoroughly. I've been running them ever since the initial implementation and I have almost gone crazy due to a subtle race condition that only showed up when running those tests.

@jschmidt-icinga
Copy link
Contributor

a subtle race condition that only showed up when running those tests.

Can you describe this in a bit more detail so I know what to look out for, i.e. the symptoms of the race condition?

@yhabteab
Copy link
Member Author

Can you describe this in a bit more detail so I know what to look out for, i.e. the symptoms of the race condition?

Well, the obvious symptom is that the integration tests (specifically the Redundancy Group ones) will sporadically fail because either Icinga 2 didn't sent a delete command when it's supposed to, or deleted something that it shouldn't have. Generally speaking, if the tests don't succeed then there is a bug in here.

Copy link
Member

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't consider this a full review, just what I noticed at first glance.

@yhabteab yhabteab force-pushed the efficient-config-and-state-update-queue branch from c8274a0 to 3247cb4 Compare November 20, 2025 12:33
Comment on lines +85 to +90
// Wait up to 5 seconds for ongoing operations to finish.
asio::deadline_timer waiter(m_Strand.context(), boost::posix_time::seconds(5));
waiter.async_wait(yc);

m_QueuedWrites.Set(); // Wake up write loop
m_QueuedReads.Set(); // Wake up read loop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait up to 5 seconds? Doesn't this just always wait 5 seconds?

And why are the read/write loops only woken after that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait up to 5 seconds? Doesn't this just always wait 5 seconds?

Yes, it always waits 5 seconds.

And why are the read/write loops only woken after that?

Doesn't make any difference but I've moved these above the timer now.

Comment on lines +55 to +59
// Limits the number of pending queries the Rcon can have at any given time to reduce the memory overhead to
// the absolute minimum necessary, since the size of the pending queue items is much smaller than the size
// of the actual Redis queries. Thus, this will slow down the worker thread a bit from generating too many
// Redis queries when the Redis connection is saturated.
constexpr size_t maxPendingQueries = 512;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That number should be just big enough that there are enough queries in flight so that the throughput isn't limited by latency. For any other queries, it's more desirable to stay in the worker queue than the Redis query queue, because in the first, they can still be combined with other operations. My intuition is that the number is bigger than it needs to be.

Alternatively, could it be feasible to check the state of the Redis connection so that we write queries until the write operations would block. so that we don't have (many) more queries in flight than the send and receive queues can hold.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reduced the number to 128 now but I'll have to think more thoroughly about the alternative approaches you suggested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could it be feasible to check the state of the Redis connection so that we write queries until the write operations would block. so that we don't have (many) more queries in flight than the send and receive queues can hold.

Now, after thinking about it some more, there's simply no way to do that reliably without implementing a full-blown backpressure mechanism, which is not worth the effort here but also way out of scope for this PR. The easiest way to limit the number of in-flight queries is to further limit the number of pending queries if 128 is still too much for you. If that's not sufficient, we can think about implementing a fine-grained backpressure mechanism in a separate PR but I think simply limiting the number of pending queries is sufficient for now IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think even a dead-simple backpressure mechanism would magically solve all kinds of queue growth (OOM) problems. (Also applies to perfdata.) Just think about it! If the action causing a (e.g Redis) backend write is blocked (as with e.g RedisConnection::Sync) until sent (or, ideally, processed), it delays the next action, so that jam doesn't even occur in our software.

Ok... your hosts might be checked effectively every 5.5m instead of 5m – 🤷‍♂️.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think even a dead-simple backpressure mechanism would magically solve all kinds of queue growth (OOM) problems.

There's no OOM problems here. The Redis queue is currently limited to 128 queries only and is still TBD whether it's too much.

Just think about it! If the action causing a (e.g Redis) backend write is blocked (as with e.g RedisConnection::Sync) until sent (or, ideally, processed), it delays the next action, so that jam doesn't even occur in our software.

And when should the producer trigger the RedisConnection::Sync function? After each processed item? After n processed items? Doing it after each processed item would be overkill and completely unnecessary, so you'd have to figure out when it's appropriate to perform such a blocking call.

@yhabteab yhabteab force-pushed the efficient-config-and-state-update-queue branch 2 times, most recently from 1916e7a to ac841b3 Compare November 26, 2025 09:32
@yhabteab yhabteab requested a review from julianbrost November 26, 2025 09:35
@yhabteab yhabteab force-pushed the efficient-config-and-state-update-queue branch from ac841b3 to a247654 Compare November 26, 2025 16:02
Comment on lines 2415 to 2433

if (checkable->GetEnableActiveChecks()) {
m_Rcon->FireAndForgetQuery(
if (checkable->GetEnableActiveChecks() && checkable->IsActive()) {
m_RconWorker->FireAndForgetQuery(
{
"ZADD",
dynamic_pointer_cast<Service>(checkable) ? "icinga:nextupdate:service" : "icinga:nextupdate:host",
Convert::ToString(checkable->GetNextUpdate()),
GetObjectIdentifier(checkable)
},
Prio::CheckResult
}
);
} else {
m_Rcon->FireAndForgetQuery(
} else if (!checkable->GetEnableActiveChecks() || checkable->GetExtension("ConfigObjectDeleted")) {
m_RconWorker->FireAndForgetQuery(
{
"ZREM",
dynamic_pointer_cast<Service>(checkable) ? "icinga:nextupdate:service" : "icinga:nextupdate:host",
GetObjectIdentifier(checkable)
},
Prio::CheckResult
}
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why you changed the conditions in here like this, in particular, why the else branch is no longer unconditional. Is there actually a situation where this function is supposed to do nothing and what is it? And why is it even run/enqueued then?

Copy link
Member Author

@yhabteab yhabteab Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, as long as the GetEnableActiveChecks returned true, it alwas sent a ZADD command even though the object might just have been deleted and resulted into having zombie icinga:nextupdate:* entries till the next Redis full dump. Now, I've changed it to additionally consider the object's liveness and causes to not send any ZADD commands if the object is already deactivated. Consequently, the else branch shouldn't also send any ZREM commands just because the object isn't active anymore, hence the additional checks.

@yhabteab yhabteab force-pushed the efficient-config-and-state-update-queue branch from a247654 to 6695294 Compare January 21, 2026 12:31
@yhabteab
Copy link
Member Author

Rebased + cherry-picked commit dc9d40f + addressed some of the still open discusstions.

Base automatically changed from use-string-view-consistently to master March 16, 2026 12:00
@julianbrost
Copy link
Member

I've pushed an updated version, ad85daf. The changes (9413ba9..ad85daf) are as follows:

  • Updated doc comments.
  • A few changes to variable names to prevent shadowing.
  • Consistent declaration of return types of the individual GetQueueLookupKey() methods, also marked as [[nodiscard]] to make clang-tidy happy.
  • Changed the template magic so that the classes are closer to how they were before. In particular, the key extractor is now implemented by being a template class that's specialized on std::variant:
    template<typename T>
    struct KeyExtractor /* not implemented, only the specialization for std::variant is used */;
    /**
    * Helper to use @c PendingQueueItem in @c boost::multi_index_container.
    *
    * It implements a key extractor that @c multi_index_container will invoke for each element to create an index.
    * Every individual type in @c PendingQueueItem::ItemVariant implements a method @c GetQueueLookupKey() that will
    * be invoked by this key extractor. The return value from the individual type is then returned as part of a second
    * variant type (its element types are automatically deduces from the individual @c GetQueueLookupKey() methods).
    * @c multi_index_container will then just use the standard comparison operators on it for building the index.
    *
    * @ingroup icingadb
    */
    template<typename ...Ts>
    struct KeyExtractor<std::variant<Ts...>> {
    using result_type = std::variant<decltype(std::declval<Ts>().GetQueueLookupKey())...>;
    result_type operator()(const PendingQueueItem& queueItem) const
    {
    return std::visit([](const auto& innerItem) -> result_type {
    return innerItem.GetQueueLookupKey();
    }, queueItem.Item);
    };
    };

Please have a look and feel free to incorporate it into the PR if you're happy with it.

@yhabteab yhabteab force-pushed the efficient-config-and-state-update-queue branch from 51273c2 to adfbce6 Compare March 17, 2026 08:19
Comment on lines +430 to 441
/**
* The primary Redis connection used to send history and heartbeat queries.
*
* This connection is used exclusively for sending history and heartbeat queries to Redis. It ensures that
* history and heartbeat operations do not interfere with other Redis operations. Also, it is the leader for
* all other Redis connections including @c m_RconWorker, and is the only source of truth for all IcingaDB Redis
* related connection statistics.
*
* Note: This will still be shared with the icingadb check command, as that command also sends
* only XREAD queries which are similar in nature to history/heartbeat queries.
*/
RedisConnection::Ptr m_Rcon;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment doesn't match the current implementation as heartbeats are sent to m_RconWorker:

RedisConnection::Query query {"XADD", "icinga:stats", "MAXLEN", "1", "*"};
{
ObjectLock statusLock (status);
for (auto& kv : status) {
query.emplace_back(kv.first);
query.emplace_back(JsonEncode(kv.second));
}
}
m_RconWorker->FireAndForgetQuery(std::move(query), {}, true /* high priority */);

(Yes, it says icinga:stats and not explicitly heartbeat, but that's what Icinga DB considers to be the heartbeat here.)

Though I'd probably switch that write over to m_Rcon, i.e. change the code, not the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an oversight, otherwise I'd have also changed the check on m_Rcon at the start of the function. Done.

Comment on lines +446 to +452
/**
* A Redis connection for general queries.
*
* This connection is used for all non-history and non-heartbeat related queries to Redis.
* It is a child of @c m_Rcon, meaning it forwards all its connection stats to @c m_Rcon as well.
*/
RedisConnection::Ptr m_RconWorker;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't "config and state updates" better describe what that connection is used for than just "general queries"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +97 to +102
if (std::holds_alternative<queue::RelationsDeletionItem>(it->Item)) {
// We don't know whether the previous items are related to this deletion item or not,
// thus we can't just process this right now when there are older items in the queue.
// Otherwise, we might delete something that is going to be updated/created.
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that we talked about this in person before and you also reminded me of it recently, but neither did I find a mention in the PR, nor does it look like this has already been addressed:

Shouldn't there be something similar for config objects that are deleted? Those shouldn't be processed out of order for similar reasons. Deleting and recreating an object with the same name will give different pointers but the Icinga DB object ID stays the same, so an old delete could possibly delete a new object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but neither did I find a mention in the PR, nor does it look like this has already been addressed

I didn't address it yet because you didn't give me a clear answer when I explicitly asked you whether I should add a similar check for that case as well. Instead, you sounded like you aren't even satisfied with the existing check at all, so I did nothing.

Btw, now that there's also another case where we acquire an olock I'm not sure whether it makes sense to skip any items at all. So, I've changed the implementation to never process items out of order, but instead to always wait for some time if we can't acquire the olock immediately. See 006e380 for the details.

Comment on lines +105 to +110
if (auto age = now - it->EnqueueTime; 1000ms > age) {
if (it == seqView.begin()) {
retryAfter = 1000ms - age;
}
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, a second is quite a long time on a computer, so I'd consider making this interval shorter. After all, this ends up being a delay showing up everywhere. My intuition says that something in the low hundreds of milliseconds should already be plenty, like 200ms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced it to 300ms.

@yhabteab yhabteab force-pushed the efficient-config-and-state-update-queue branch from adfbce6 to 0ba83f5 Compare March 23, 2026 09:34
@yhabteab
Copy link
Member Author

Rebased and addressed all the requested changes.

@yhabteab yhabteab requested review from jschmidt-icinga and julianbrost and removed request for jschmidt-icinga March 23, 2026 10:05
yhabteab and others added 18 commits March 23, 2026 11:26
…al sync

Don't send nextupdate as part of the intial sync. Instead, enque them to
the background worker to be sent after the initial dump is done.
This reverts commit f6f7d9b and all
other its new users.
As opposed to the previous version which used a complex data structure
to correctly manage the query priorities, this version uses two separate
queues for the high and normal priority writes. All high priority writes
are processed in FIFO order but over take all queries from the normal
priority queue. The later queue only be processed when the high priority
queue is empty.
We can't drop the `OnNextCheckUpdated` signal entirely yet, as IDO still
relies on it.
Previously, the checkable was locked while processing all the dependency
registration stuff, so the worker thread should also do the same to
avoid any potential race conditions.
This commit restructures the queue items so that each one now has a method
`GetQueueLookupKey()` that is used to derive which elements of the queue are
considered to be equal. For this, there is a key extractor for the
`multi_index_container` that takes the `variant` from the queue item, calls
that method on it, and puts the result in a second variant type. The types in
that variant type are automatically deduced from the return types of the
individual methods.
Now, if we can't acquire the olock on the config object, we immediately
sleep for 10ms instead of trying to process the next item in the queue.
This is way easier than having to deal with the potential out of order
processing of items in the queue in both ways, i.e., we don't want to
send delete events for objects while their created events haven't been
processed yet and vice versa.
@yhabteab yhabteab force-pushed the efficient-config-and-state-update-queue branch from 0ba83f5 to 3d7e0c4 Compare March 23, 2026 10:27
@yhabteab
Copy link
Member Author

Just fixed a minor style issue and a doc comment. See the diff by clicking ont he compare button.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Icinga DB: Better config and state update queueing

4 participants