Skip to content

Conversation

@afrueda97
Copy link
Contributor

@afrueda97 afrueda97 commented Oct 3, 2025

This provider currently performs several full loops over the Bazel profiles. This results in it not scaling well performance and memory consumption-wise.

Also at the profile level, some expensive copying and re-sorting of bazel profile data is performed each time it is requested by the providers.

Allocations:
image

CPU:
image

This PR aims to reduce the amount of loops over the data in CriticalPathQueuingDurationDataProvider to just one, enabling it to scale better.
Also, cleans the expensive copying and re-sorting of bazel profile data each time it is requested, by switching thread building to a Builder pattern, which sorts the entries internally as needed before returning the immutable sorted result.

Thanks to @iamricard for doing a lot of the initial work and investigation on this (the branch was opened in another repository so I didn't know how to cleanly credit him in the commits I moved over to here)

@afrueda97 afrueda97 requested a review from iamricard October 3, 2025 13:58
@afrueda97 afrueda97 force-pushed the felipe-reduce-complexity-queuing-provider branch from 3852557 to 3bb8c61 Compare October 3, 2025 14:04
public List<CompleteEvent> getCompleteEvents() {
completeEvents.sort(Comparator.comparing((e) -> e.start));
return ImmutableList.copyOf(completeEvents);
synchronized (this) {
Copy link
Contributor

@laszlocsomor laszlocsomor Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like completeEvents is actually mutable, and mutated in ProfileThread.addEvent. So if getCompleteEvents is called, then addEvent, then getCompleteEvents again, the second call will return a possibly unsorted list.

Is it possible to make the object immutable?

If not, is there any guarantee of call order we could enforce? For example, if addEvent is never expected to be called after getCompleteEvents, then getCompleteEvents should "freeze" the object (set some flag), no longer allowing mutation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, although addEvent is only currently called when creating the BazelProfile

// TODO: Use success response to take action on errant events.
profileThread.addEvent(traceEvent);

It definitely wouldn't hurt to, for example, use an immutable list here to ensure we don't break the ordering afterwards

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it so the code guarantees correct calling order? For example add a ProfileThread.Builder class with a mutable completeEvents that you can call addEvent on, then the final ProfileThread would have an ImmutableList as completeEvents (and it'd be sorted).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check for this that throws an ISE if such modifications are attempted in 5aac8c1.

(also, a test for this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we were also doing a lot of expensive copying and re-sorting in other places which I missed. Switching to a builder pattern cleaned all of those up, should be done in 9868671

PTAL

Signed-off-by: Felipe <afrueda97@outlook.com>
@afrueda97 afrueda97 force-pushed the felipe-reduce-complexity-queuing-provider branch from 3bb8c61 to cdb011d Compare October 3, 2025 14:15
@afrueda97
Copy link
Contributor Author

Apologies for the force-push, I had accidentally pushed an old version of this work

…ng sorted

Signed-off-by: Felipe <afrueda97@outlook.com>
Signed-off-by: Felipe <afrueda97@outlook.com>
…d data

Signed-off-by: Felipe <afrueda97@outlook.com>
@afrueda97 afrueda97 requested a review from laszlocsomor October 3, 2025 16:07
* modified anymore.
*/
public boolean addEvent(JsonObject event) {
if (sorted) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are instances of ProfileThread expected to be accessed concurrently?

Since getCompleteEvents synchronizes on this to access sorted, we must do the same here. Or make sorted and AtomicBoolean.

If access to some members must be synchronized, then it's likely that we should make all member access thread-safe.

Copy link
Contributor Author

@afrueda97 afrueda97 Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think GitHub linked you to the previous commit, this code is gone now in favor of the builder pattern you suggested

(which, being fully immutable after built, this class should also now be safe to access concurrently)

Signed-off-by: Felipe <afrueda97@outlook.com>
Signed-off-by: Felipe <afrueda97@outlook.com>
@afrueda97
Copy link
Contributor Author

Friendly ping

@iamricard
Copy link

Sorry I'll only be able to get to this review towards the end of the week 😞

Copy link
Contributor

@laszlocsomor laszlocsomor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything other than CriticalPathQueuingDurationDataProvider.java‎ looks good; I didn't review that file because I'm not sure it's related?

import java.util.stream.Collectors;
import javax.annotation.Nullable;

public class ProfileThread {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional, for a follow-up: use @AutoValue and @AutoBuilder to remove some of the boilerplate (getters, equals/hashCode/toString)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the changes in this file relevant to this PR?
like the new variables such as cPathActionNameToSelfEvent and related logic

Copy link
Contributor Author

@afrueda97 afrueda97 Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are, unfortunately GitHub does a great job of not showing the differences.

Essentially the changes are intended to:

  1. Reduce how many loops are performed over bazelProfile.getCriticalPath().get().getCompleteEvents() and .getThreads().flatMap((thread) -> thread.getCompleteEvents().stream()). It should be happening just once for each of these lists now.
  2. "Unroll" complicated stream-based code into simpler for loops, to make it easier to get a sense of its performance and scalability
  3. Use more efficient data structures for lookups, which are the HashMaps and HashMultimaps at the top.

Copy link
Contributor

@laszlocsomor laszlocsomor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments

Comment on lines 155 to 160
// We have found cases where the end time of the critical path event is less than the end
// time of the processing event. This might be a bug / inconsistency in Bazel profile
// writing.
if (!cPathEvent.end.almostEquals(event.end) && cPathEvent.end.compareTo(event.end) <= 0) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came from L96 on the left, and that check was only there for "documentation purposes".
Now we'll actually do the check. Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I've reverted this to an equivalent of how it originally was (always-false check now).

@afrueda97 afrueda97 force-pushed the felipe-reduce-complexity-queuing-provider branch from 85bbc81 to 66ff455 Compare October 10, 2025 00:44
Signed-off-by: Felipe <afrueda97@outlook.com>
Signed-off-by: Felipe <afrueda97@outlook.com>
@afrueda97 afrueda97 force-pushed the felipe-reduce-complexity-queuing-provider branch from 059a6db to 4f6fab5 Compare October 10, 2025 01:24
Signed-off-by: Felipe <afrueda97@outlook.com>
Signed-off-by: Felipe <afrueda97@outlook.com>
@afrueda97 afrueda97 merged commit 5847f5b into main Oct 10, 2025
3 checks passed
@afrueda97 afrueda97 deleted the felipe-reduce-complexity-queuing-provider branch October 10, 2025 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants