Skip to content

Numa patch#3

Open
YWHyuk wants to merge 23 commits intolinuxgeek-Inc:masterfrom
YWHyuk:NUMA
Open

Numa patch#3
YWHyuk wants to merge 23 commits intolinuxgeek-Inc:masterfrom
YWHyuk:NUMA

Conversation

@YWHyuk
Copy link

@YWHyuk YWHyuk commented Jul 12, 2022

Revert highly contended benchmark. This patch is experimental feature.

Plus, Add list benchmark infrastructure.

YWHyuk added 23 commits April 28, 2022 19:24
CC-lock is based on flat-lock combining algorithm.
In this lock, only one thread, called combiner
thread the request of critical section. So, combiner
thread can exploit locality and aviod high contetetion
between lock varible.

When each cpu use only one node, let's assume lock
hold node A. In this case, node A's
(wait, completed) status should be (false, false).

Lock
 |
 A

When A,B cpu race occured, Let's assume that
B is win. Then, B will try to spin on A's wait
Status.

A   ->   B
w:F      w:T

At the same time, A was enqueued. So, A's wait
status was set to True like below.

A   ->   B   ->   A
w:T      w:T      w:T

This lead to deadlock.

To avoid above node-reusing problem, each cpu has two
cc_node. Those node are used alternately.

A_0 ->   B_0 ->   A_1
w:f      w:T      w:T

Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Test reported that there is a deadlock. Situation are
below.

Node(0, 1) {
	req = 00000000d0495726,
	params = 000000002f36f5ac,
	wait = 0, completed = 1,
	refcount = 0,
	Next (2, 0)
	Prev (0, 0)
}

Node(2, 0) {
	req = 00000000d0495726,
	params = 000000002f36f5ac,
	wait = 1, completed = 0,
	refcount = 0,
	Next (2, 1)
	Prev (0, 1)
}

Node (0, 1)'s request are handled. So, it wait,
completed status are (0, 1). But, it's next node
Node(2, 0)'s wait are still 1. The combiner thread
should set Node(2, 0) wait = 0. Previous logic
set wait = 0, when DECODE_CPU(pending_cpu) != NR_CPUS.

But there can be race between combiner thread
and normal thread. In the combiner thread it
check node->req first, then it check node->next.
So there could be a situation below

Node(0, 1)			Node(2, 0)
				prev->req = req
if(pending->req)
...
DECODE_CPU(pending->next)
				prev->next = this_cpu

To fix this, combiner thread check node->next first.

Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Previous, test thread used jiffes to measure the spent
time. But, it's resolution is low. So all the results
are zero or one. So use sched_clock.

Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
To keep order of reading node->next and writing of
node->wait, node->completed, smp_mb should be used
instead of smp_mb(). So fix it

Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Using the "echo 2 > trigger", spinlock based
benchmark can be run.

Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
This script provide measurement result parsing and plotting
features.
To optmize, enable debug code when DEBUG is defined

Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com>
To reduce inter-node traffic, add delay when it fail to get global
lock. The time of delay is a value from experimental result.

Signed-off-by: wonhyuk yang <vvghjk1234@gmail.com>
Add new benchmark that measure the time of list operation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant