Conversation
closes #2
There was a problem hiding this comment.
Following my comments on the commit, ideally we decide to update the server's metrics every n seconds (the typical update rate of Prometheus is 60s). At 0s, the producer thread begins writing data to a buffer. At 60s, it begins writing to a new buffer, and so on. Incoming consumers begin reading from the latest buffer, incrementing that buffer's "reference counter". We periodically "remove" old buffers whose reference counters are 0. We put a timeout on consumers to limit the maximum number of buffers.
Synchronization is required to ensure:
- No consumer begins reading from an old buffer.
- No buffer is removed while being consumed.
main.go
Outdated
| mu.RLock() | ||
| defer mu.RUnlock() | ||
| w.Write(metrics) |
There was a problem hiding this comment.
This opens up the possibility for DoS. w.Write() could potentially take a long time to finish, during which time the gather goroutine would be locked. At best, the now incorrect timings of the p.Step() calls would result in incorrect metrics. At worst, a rogue, slow connection could take down the entire server.
There was a problem hiding this comment.
So, if the problem is with the HTTP I/O operation duration, can't we just make a copy of the latest matrics variable? This way, the mu.Unlock() doesn't at the HTTP I/O Duration mecry.
Here:
http.HandleFunc("/metrics", func(w http.ResponseWriter, r *http.Request) {
mu.RLock()
metricsCopy := make([]byte, len(metrics))
_ = copy(metricsCopy, metrics)
mu.RUnlock()
w.Write(metricsCopy)
})There was a problem hiding this comment.
This would create a copy of metrics for every HTTP connection, using up memory, which again would allow a potential DoS with lots of connections.
There was a problem hiding this comment.
About the http server in general, the program can benefit from having a deadline for both Read and Write Operations.
|
I can't really wrap my mind around a solution that doesn't involve making a copy of the latest value.
So if i understand your suggestion correctly , the way you are suggesting is that, the program always serves a buffer that is from a minute ago, this way, that buffer is concurrent safe because nothing gets written to it. And by using this kind of synchronization, the program doesn't have anything to provide for the first 60 seconds when it starts working. (Which is not a concern) |
|
I wrote
Yes.
Yes. This is similar to the way file systems handle references to inodes. I think it's called the read-copy-update mechanism? I don't know. |
there's no point in it anyway since we prefer GOTOOLCHAIN=auto.
|
I've done some work regarding to your suggestion. I've introduced a struct 'RCU` that implements something similar to this. Do you have any insights relating to this implementation? I'm not quite sure about its performance outcomes, mainly because of the heavy usage of Mutex nearly everywhere.
I'm not quite sure when this critical moment should happen? And also, who should responsible for that? the gather() function, or something integrated in the Currently it it happens in the func gather() {
p := pipeline.New([]int{1, 5, 10, 15, 30, 60})
timer := time.NewTicker(60 * time.Second)
for {
data, err := netdev.ReadNetDev()
if err != nil {
panic(fmt.Errorf("could not read netdev: %w", err))
}
recv, trns, err := netdev.GetTraffic(data)
if err != nil {
panic(fmt.Errorf("could not get traffic: %w", err))
}
m := p.Step(recv, trns)
// Non blocking. It should be done fast.
rcuSlice.Assign(m)
select {
case <-timer.C:
rcuSlice.Rotate()
default:
}
time.Sleep(time.Second)
}
} |
|
We seem to have forgotten that Go has a garbage collector. |
|
Thanks. Consider making this repository public, whenever you want. It looks production-grade to me. |
Since the
metricsis likely to be read more frequently than written, using RWMutex seems to fit better for this situation.closes #2