|
for topic, partitions := range topics { |
|
for partitionID, partition := range partitions { |
|
partitionStatus := evaluatePartitionStatus(partition, module.minimumComplete, module.allowedLag) |
|
partitionStatus.Topic = topic |
|
partitionStatus.Partition = int32(partitionID) |
|
partitionStatus.Owner = partition.Owner |
|
partitionStatus.ClientID = partition.ClientID |
|
|
|
if partitionStatus.Status > status.Status { |
|
// If the partition status is greater than StatusError, we just mark it as StatusError |
|
if partitionStatus.Status > protocol.StatusError { |
|
status.Status = protocol.StatusError |
|
} else { |
|
status.Status = partitionStatus.Status |
|
} |
|
} |
|
|
|
if (status.Maxlag == nil) || (partitionStatus.CurrentLag > status.Maxlag.CurrentLag) { |
|
status.Maxlag = partitionStatus |
|
} |
|
if partitionStatus.Complete == 1.0 { |
|
completePartitions++ |
|
} |
|
status.Partitions[count] = partitionStatus |
|
count++ |
|
} |
|
} |
This piece of code loops over a map of topics, and if the last topic's last partition is reporting ok, the consumer status will be ok.
Given that the map iteration in go is randomized, the consumer status is unpredictable.
The following are the real world effect from this:
-
The metric from burrow of a consumer when scraping at 2m interval:

-
The metric from burrow-exporter which requests burrow at 30s interval, and then being scrapped at 2m interval:

The more frequently we query (as burrow uses 30s cache expiration by default), the more likely to see non-OK consumer status.
Burrow/core/internal/evaluator/caching.go
Lines 226 to 252 in 4a05b20
This piece of code loops over a map of topics, and if the last topic's last partition is reporting ok, the consumer status will be ok.
Given that the map iteration in go is randomized, the consumer status is unpredictable.
The following are the real world effect from this:
The metric from burrow of a consumer when scraping at 2m interval:

The metric from burrow-exporter which requests burrow at 30s interval, and then being scrapped at 2m interval:

The more frequently we query (as burrow uses 30s cache expiration by default), the more likely to see non-OK consumer status.