fix: race condition by ParsaJR · Pull Request #3 · layer8co/netexp

ParsaJR · 2025-12-12T11:46:12Z

Since the metrics is likely to be read more frequently than written, using RWMutex seems to fit better for this situation.

closes #2

demurky

Following my comments on the commit, ideally we decide to update the server's metrics every n seconds (the typical update rate of Prometheus is 60s). At 0s, the producer thread begins writing data to a buffer. At 60s, it begins writing to a new buffer, and so on. Incoming consumers begin reading from the latest buffer, incrementing that buffer's "reference counter". We periodically "remove" old buffers whose reference counters are 0. We put a timeout on consumers to limit the maximum number of buffers.

Synchronization is required to ensure:

No consumer begins reading from an old buffer.
No buffer is removed while being consumed.

demurky · 2025-12-12T15:57:07Z

main.go

+		mu.RLock()
+		defer mu.RUnlock()
 		w.Write(metrics)


This opens up the possibility for DoS. w.Write() could potentially take a long time to finish, during which time the gather goroutine would be locked. At best, the now incorrect timings of the p.Step() calls would result in incorrect metrics. At worst, a rogue, slow connection could take down the entire server.

So, if the problem is with the HTTP I/O operation duration, can't we just make a copy of the latest matrics variable? This way, the mu.Unlock() doesn't at the HTTP I/O Duration mecry.
Here:

http.HandleFunc("/metrics", func(w http.ResponseWriter, r *http.Request) { mu.RLock() metricsCopy := make([]byte, len(metrics)) _ = copy(metricsCopy, metrics) mu.RUnlock() w.Write(metricsCopy) })

This would create a copy of metrics for every HTTP connection, using up memory, which again would allow a potential DoS with lots of connections.

About the http server in general, the program can benefit from having a deadline for both Read and Write Operations.

ParsaJR · 2025-12-13T06:30:19Z

I can't really wrap my mind around a solution that doesn't involve making a copy of the latest value.

Incoming consumers begin reading from the latest buffer, incrementing that buffer's "reference counter"

So if i understand your suggestion correctly , the way you are suggesting is that, the program always serves a buffer that is from a minute ago, this way, that buffer is concurrent safe because nothing gets written to it.

And by using this kind of synchronization, the program doesn't have anything to provide for the first 60 seconds when it starts working. (Which is not a concern)

demurky · 2025-12-13T06:46:45Z

I wrote Incoming consumers begin reading from the latest buffer. I really meant the last buffer not being written to; the second to last buffer.

this way, that buffer is concurrent safe because nothing gets written to it.

Yes.

And by using this kind of synchronization, the program doesn't have anything to provide for the first 60 seconds when it starts working

Yes.

This is similar to the way file systems handle references to inodes. I think it's called the read-copy-update mechanism? I don't know.

there's no point in it anyway since we prefer GOTOOLCHAIN=auto.

ParsaJR · 2025-12-15T20:57:02Z

I've done some work regarding to your suggestion. I've introduced a struct 'RCU` that implements something similar to this.

Do you have any insights relating to this implementation? I'm not quite sure about its performance outcomes, mainly because of the heavy usage of Mutex nearly everywhere.

We periodically "remove" old buffers whose reference counters are 0.

I'm not quite sure when this critical moment should happen?

And also, who should responsible for that? the gather() function, or something integrated in the rcu structure?

Currently it it happens in the gather(). The timer.Tick ticks every one minute and it checks in the non blocking select statement:

func gather() {
	p := pipeline.New([]int{1, 5, 10, 15, 30, 60})

	timer := time.NewTicker(60 * time.Second)

	for {
		data, err := netdev.ReadNetDev()
		if err != nil {
			panic(fmt.Errorf("could not read netdev: %w", err))
		}

		recv, trns, err := netdev.GetTraffic(data)
		if err != nil {
			panic(fmt.Errorf("could not get traffic: %w", err))
		}

		m := p.Step(recv, trns)

		// Non blocking. It should be done fast.
		rcuSlice.Assign(m)

		select {
		case <-timer.C:
			rcuSlice.Rotate()
		default:
		}

		time.Sleep(time.Second)
	}
}

demurky · 2025-12-15T22:15:11Z

We seem to have forgotten that Go has a garbage collector.

demurky

lgtm

ParsaJR · 2025-12-16T16:50:33Z

Thanks. Consider making this repository public, whenever you want.

It looks production-grade to me.

fix: race condition

fix: race condition

558bc17

closes #2

ParsaJR requested a review from demurky December 12, 2025 11:46

demurky requested changes Dec 12, 2025

View reviewed changes

demurky and others added 7 commits December 13, 2025 19:52

merge the math package into the series package

ea632f3

remove docker support

f07ef2d

update the testing workflow

077a38a

bump go version to 1.25

3466435

remove oldstable support from the testing workflow

bc341f6

there's no point in it anyway since we prefer GOTOOLCHAIN=auto.

bump the workflow actions

30b434d

Custom rcu-like structure

d467a41

ParsaJR requested a review from demurky December 15, 2025 21:02

using "sync/atomic" primitives

c9c6540

demurky merged commit 61d54eb into main Dec 16, 2025
1 check passed

demurky reviewed Dec 16, 2025

View reviewed changes

ParsaJR deleted the fix-race branch December 16, 2025 20:19

demurky added a commit that referenced this pull request Dec 17, 2025

Merge pull request #3 from Layer8Collective/fix-race

385957e

fix: race condition

demurky added a commit that referenced this pull request Dec 17, 2025

Merge pull request #3 from Layer8Collective/fix-race

2c51b32

fix: race condition

demurky added a commit that referenced this pull request Dec 17, 2025

Merge pull request #3 from Layer8Collective/fix-race

f61d1a1

fix: race condition

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: race condition#3

fix: race condition#3
demurky merged 9 commits intomainfrom
fix-race

ParsaJR commented Dec 12, 2025

Uh oh!

demurky left a comment •

edited

Loading

Uh oh!

demurky Dec 12, 2025

Uh oh!

ParsaJR Dec 12, 2025

Uh oh!

demurky Dec 12, 2025

Uh oh!

ParsaJR Dec 12, 2025

Uh oh!

ParsaJR commented Dec 13, 2025 •

edited

Loading

Uh oh!

demurky commented Dec 13, 2025

Uh oh!

ParsaJR commented Dec 15, 2025 •

edited

Loading

Uh oh!

demurky commented Dec 15, 2025

Uh oh!

Uh oh!

demurky left a comment

Uh oh!

ParsaJR commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ParsaJR commented Dec 12, 2025

Uh oh!

demurky left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

demurky Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

ParsaJR Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

demurky Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

ParsaJR Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

ParsaJR commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

demurky commented Dec 13, 2025

Uh oh!

ParsaJR commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

demurky commented Dec 15, 2025

Uh oh!

Uh oh!

demurky left a comment

Choose a reason for hiding this comment

Uh oh!

ParsaJR commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

demurky left a comment •

edited

Loading

ParsaJR commented Dec 13, 2025 •

edited

Loading

ParsaJR commented Dec 15, 2025 •

edited

Loading