Skip to content

Commit 63257cc

Browse files
committed
Add loadtesting framework
On-behalf-of: SAP <simon.bein@sap.com> Signed-off-by: Simon Bein <simontheleg@gmail.com>
1 parent 301a8f7 commit 63257cc

20 files changed

Lines changed: 1246 additions & 108 deletions

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -410,7 +410,7 @@ test: ## Run tests
410410
for MOD in $$(git ls-files '**/go.mod' | sed 's,/go.mod,,'); do \
411411
if [ "$$MOD" != "." ]; then \
412412
echo "Testing $$MOD module..."; \
413-
(cd $$MOD && $(GO_TEST) -race $(COUNT_ARG) -coverprofile=coverage.txt -covermode=atomic $(TEST_ARGS) $(WHAT)); \
413+
(cd $$MOD && $(GO_TEST) -race $(COUNT_ARG) -coverprofile=coverage.txt -covermode=atomic $(TEST_ARGS) $$(go list "$(WHAT)" | grep -v 'test/load/testing')); \
414414
fi; \
415415
done
416416

test/load/Proposal.md

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# Load Testing
2+
3+
## 10000 workspaces reference architecture
4+
5+
Assumptions & Requirements:
6+
7+
* We want to test how a kcp installation with 10000 workspaces behaves on synthetic workloads
8+
* We do not want to test how easily kcp handles adding 10000 workspaces at once
9+
* We don't want to run etcd on machines which are hosting kcp
10+
* We will be using kcp-operator to setup a sharded kcp instance
11+
* The minimum amount of replicas for any component is 3
12+
* The loadtests are infrastructure provider agnostic. This will allow us and community members to
13+
experiment with different infrastructure sizes
14+
* We treat the rootshard like we would any other shard. It will be filled with regular workspaces
15+
So shard1 = rootshard
16+
* As the kcp-operator currently has no support for a dedicated cache server: We have decided to still
17+
work with the default model of having an embedded cache in the rootshard (even if it overloads the
18+
rootshard). Specifically this means when we have 3 shards, we create 1 Rootshard and 2 Shards
19+
* Results are stored in some permanent storage so we can use them for comparison later
20+
21+
### Architecture
22+
23+
[drawing of the general layout](./architecture.excalidraw)
24+
25+
### Node calculation
26+
27+
All node calculations are based on the number of workspaces and use the following recommended constants:
28+
29+
* max_workspaces_per_shard = 3500
30+
* min_replicas = 3
31+
* kcp_server_buffer = 512MB
32+
* #kcp_cache_nodes = 1
33+
* #aux_nodes = 1
34+
* #frontproxy_nodes = 3
35+
* mem_per_workspace = 5MB
36+
37+
---
38+
39+
1. We calculate the number of shards
40+
41+
```txt
42+
#shards = round_up(#workspaces / max_workspaces_per_shard)
43+
```
44+
45+
1. Now we can calculate the number of etcd nodes.
46+
47+
```txt
48+
#etcd_nodes = #shards * min_replicas
49+
```
50+
51+
1. We can calculcate the number of shards and their size in relation to the number of workspaces
52+
53+
```txt
54+
#shard_nodes = #shards * min_replicas
55+
#actual_workspaces_per_shard = workspaces / shards
56+
kcp_server_node_mem = kcp_server_buffer + (#actual_workspaces * mem_per_workspace)
57+
```
58+
59+
The total number of all required nodes is calculated as follows
60+
61+
```txt
62+
#total_nodes = #frontproxy_nodes + #kcp_cache_nodes + #etcd_nodes + #kcp_server_nodes + #aux_nodes
63+
```
64+
65+
#### Example for 10000 workspaces
66+
67+
```txt
68+
#shards = 10000 / 3500 = 2,85 = 3
69+
#etcd_nodes = 3 * 3 = 9
70+
#shard_nodes = 3 * 3 = 9
71+
#actual_workspaces_per_shard = 10000 / 3 = 3333
72+
kcp_server_node_mem = 512 + (3333 * 5) = 17777MB
73+
#total_nodes = 3 + 1 + 9 + 3 + 1 = 17
74+
```
75+
76+
### Testing Protocol
77+
78+
Mantra: We want to test how a kcp installation with 10000 workspaces behaves on simulated, regular
79+
activities. We don't want to test how easily we can add 10000 workspaces at once
80+
81+
#### Procedure
82+
83+
1. Create 10000 workspaces, APIExports, etc. and patiently wait for all of them to become ready
84+
2. Simulate real world activity by simulating end-users using custom kubeconfigs to create APIBindings
85+
and then CRUD on their custom api-objects
86+
87+
##### Level 1 - 10000 empty workspaces
88+
89+
We are just going to put 10000 empty workspaces into a kcp installation. We will have a nesting level
90+
setting so we can try out if nesting has any impact (it should not). This test case
91+
is extremely deterministic and has should spread workspaces relatively equally across shards.
92+
We mainly use this as a base consumption measurement and to verify nesting has no performance impact.
93+
94+
##### Level 2 - Basic CRUD
95+
96+
Every workspace has a type and we are going to do a basic parallel CRUD workflow which we will simulate
97+
using simple Kubernetes Jobs. The workflow is done on basic objects from a single provider using a
98+
singular APIExport.
99+
100+
##### Level 3 - Multiple Providers
101+
102+
We are multixplexing the level 2 example to use multiple providers.
103+
104+
##### Outlook
105+
106+
We want to keep the initial version of the tests simple and deterministic. As a result the following
107+
topics have been discussed, but were considered not to be part of the first 3 level implementation:
108+
109+
* direct user interaction via simulated users
110+
* custom workspacetypes with initializers and finalizers
111+
* integrating the init-agent
112+
* nested workspaces living on different shards
113+
* having a chaos monkey randomly killing shards
114+
115+
### Scraping of Metrics
116+
117+
We plan on using a plain Prometheus to scrape all of the kcp-instanes: On a higher level we plan to
118+
monitor:
119+
120+
* CPU + Mem on all components
121+
* Number of Goroutines over time
122+
* Request response times on front-proxy (probably percentiles). This could also alternatively be
123+
measured inside the testing suite (clientside)
124+
* Disk IO and size on both etcd and rootshard
125+
* Total number of workspaces (to compare expected with actual)

test/load/Readme.md

Lines changed: 22 additions & 107 deletions
Original file line numberDiff line numberDiff line change
@@ -1,125 +1,40 @@
11
# Load Testing
22

3-
## 10000 workspaces reference architecture
3+
Load testing framework and loadtests for the kcp project.
44

5-
Assumptions & Requirements:
5+
## Architecture
66

7-
* We want to test how a kcp installation with 10000 workspaces behaves on synthetic workloads
8-
* We do not want to test how easily kcp handles adding 10000 workspaces at once
9-
* We don't want to run etcd on machines which are hosting kcp
10-
* We will be using kcp-operator to setup a sharded kcp instance
11-
* The minimum amount of replicas for any component is 3
12-
* The loadtests are infrastructure provider agnostic. This will allow us and community members to
13-
experiment with different infrastructure sizes
14-
* We treat the rootshard like we would any other shard. It will be filled with regular workspaces
15-
So shard1 = rootshard
16-
* As the kcp-operator currently has no support for a dedicated cache server: We have decided to still
17-
work with the default model of having an embedded cache in the rootshard (even if it overloads the
18-
rootshard). Specifically this means when we have 3 shards, we create 1 Rootshard and 2 Shards
19-
* Results are stored in some permanent storage so we can use them for comparison later
7+
Please refer to the [drawing of the general layout](./architecture.excalidraw).
208

21-
### Architecture
9+
## Setup
2210

23-
[drawing of the general layout](./architecture.excalidraw)
11+
Installation scripts and manuals are provided in [setup/Readme](./setup/Readme.md).
2412

25-
### Node calculation
13+
## Usage
2614

27-
All node calculations are based on the number of workspaces and use the following recommended constants:
15+
All test cases are organized in the `testing` folder. You can run the entire suite using:
2816

29-
* max_workspaces_per_shard = 3500
30-
* min_replicas = 3
31-
* kcp_server_buffer = 512MB
32-
* #kcp_cache_nodes = 1
33-
* #aux_nodes = 1
34-
* #frontproxy_nodes = 3
35-
* mem_per_workspace = 5MB
36-
37-
---
38-
39-
1. We calculate the number of shards
40-
41-
```txt
42-
#shards = round_up(#workspaces / max_workspaces_per_shard)
43-
```
44-
45-
1. Now we can calculate the number of etcd nodes.
46-
47-
```txt
48-
#etcd_nodes = #shards * min_replicas
49-
```
50-
51-
1. We can calculcate the number of shards and their size in relation to the number of workspaces
52-
53-
```txt
54-
#shard_nodes = #shards * min_replicas
55-
#actual_workspaces_per_shard = workspaces / shards
56-
kcp_server_node_mem = kcp_server_buffer + (#actual_workspaces * mem_per_workspace)
57-
```
58-
59-
The total number of all required nodes is calculated as follows
60-
61-
```txt
62-
#total_nodes = #frontproxy_nodes + #kcp_cache_nodes + #etcd_nodes + #kcp_server_nodes + #aux_nodes
63-
```
64-
65-
#### Example for 10000 workspaces
66-
67-
```txt
68-
#shards = 10000 / 3500 = 2,85 = 3
69-
#etcd_nodes = 3 * 3 = 9
70-
#shard_nodes = 3 * 3 = 9
71-
#actual_workspaces_per_shard = 10000 / 3 = 3333
72-
kcp_server_node_mem = 512 + (3333 * 5) = 17777MB
73-
#total_nodes = 3 + 1 + 9 + 3 + 1 = 17
17+
```sh
18+
go test ./testing/...
7419
```
7520

76-
### Testing Protocol
21+
The tests will prompt you for any specific required variables and configs.
7722

78-
Mantra: We want to test how a kcp installation with 10000 workspaces behaves on simulated, regular
79-
activities. We don't want to test how easily we can add 10000 workspaces at once
23+
Alternatively you can run a subset of tests using standard `go test` syntax. E.g.:
8024

81-
#### Procedure
82-
83-
1. Create 10000 workspaces, APIExports, etc. and patiently wait for all of them to become ready
84-
2. Simulate real world activity by simulating end-users using custom kubeconfigs to create APIBindings
85-
and then CRUD on their custom api-objects
86-
87-
##### Level 1 - 10000 empty workspaces
88-
89-
We are just going to put 10000 empty workspaces into a kcp installation. We will have a nesting level
90-
setting so we can try out if nesting has any impact (it should not). This test case
91-
is extremely deterministic and has should spread workspaces relatively equally across shards.
92-
We mainly use this as a base consumption measurement and to verify nesting has no performance impact.
93-
94-
##### Level 2 - Basic CRUD
95-
96-
Every workspace has a type and we are going to do a basic parallel CRUD workflow which we will simulate
97-
using simple Kubernetes Jobs. The workflow is done on basic objects from a single provider using a
98-
singular APIExport.
99-
100-
##### Level 3 - Multiple Providers
101-
102-
We are multixplexing the level 2 example to use multiple providers.
103-
104-
##### Outlook
25+
```sh
26+
go test ./testing/... -run ^TestExample
27+
```
10528

106-
We want to keep the initial version of the tests simple and deterministic. As a result the following
107-
topics have been discussed, but were considered not to be part of the first 3 level implementation:
29+
## Development
10830

109-
* direct user interaction via simulated users
110-
* custom workspacetypes with initializers and finalizers
111-
* integrating the init-agent
112-
* nested workspaces living on different shards
113-
* having a chaos monkey randomly killing shards
31+
The load-testing framework itself is organized in the `pkg` folder. You can run its unit
32+
tests directly using:
11433

115-
### Scraping of Metrics
34+
```sh
35+
go test ./pkg/...
36+
```
11637

117-
We plan on using a plain Prometheus to scrape all of the kcp-instanes: On a higher level we plan to
118-
monitor:
38+
## Partitioning
11939

120-
* CPU + Mem on all components
121-
* Number of Goroutines over time
122-
* Request response times on front-proxy (probably percentiles). This could also alternatively be
123-
measured inside the testing suite (clientside)
124-
* Disk IO and size on both etcd and rootshard
125-
* Total number of workspaces (to compare expected with actual)
40+
You can partition your loadtest by providing it with a unique `start` number. Please be advised that this multiplexes your test. Any load you place will be multiplied by the number of partitions. Depending per test, adjust throughput values like qps accordingly.

test/load/go.mod

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
module github.com/kcp-dev/kcp/test/load
2+
3+
go 1.24.0
4+
5+
require (
6+
github.com/montanaflynn/stats v0.7.1
7+
github.com/stretchr/testify v1.11.1
8+
k8s.io/apimachinery v0.34.2
9+
k8s.io/client-go v0.34.2
10+
)
11+
12+
require (
13+
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
14+
github.com/fxamacker/cbor/v2 v2.9.0 // indirect
15+
github.com/go-logr/logr v1.4.3 // indirect
16+
github.com/gogo/protobuf v1.3.2 // indirect
17+
github.com/json-iterator/go v1.1.12 // indirect
18+
github.com/kr/pretty v0.3.1 // indirect
19+
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
20+
github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee // indirect
21+
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
22+
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
23+
github.com/spf13/pflag v1.0.10 // indirect
24+
github.com/x448/float16 v0.8.4 // indirect
25+
go.yaml.in/yaml/v2 v2.4.3 // indirect
26+
golang.org/x/net v0.47.0 // indirect
27+
golang.org/x/oauth2 v0.30.0 // indirect
28+
golang.org/x/sys v0.38.0 // indirect
29+
golang.org/x/term v0.37.0 // indirect
30+
golang.org/x/text v0.31.0 // indirect
31+
golang.org/x/time v0.9.0 // indirect
32+
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
33+
gopkg.in/inf.v0 v0.9.1 // indirect
34+
gopkg.in/yaml.v3 v3.0.1 // indirect
35+
k8s.io/klog/v2 v2.140.0 // indirect
36+
k8s.io/kube-openapi v0.0.0-20250910181357-589584f1c912 // indirect
37+
k8s.io/utils v0.0.0-20260210185600-b8788abfbbc2 // indirect
38+
sigs.k8s.io/json v0.0.0-20250730193827-2d320260d730 // indirect
39+
sigs.k8s.io/randfill v1.0.0 // indirect
40+
sigs.k8s.io/structured-merge-diff/v6 v6.3.0 // indirect
41+
sigs.k8s.io/yaml v1.6.0 // indirect
42+
)

0 commit comments

Comments
 (0)