Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -410,7 +410,7 @@ test: ## Run tests
for MOD in $$(git ls-files '**/go.mod' | sed 's,/go.mod,,'); do \
if [ "$$MOD" != "." ]; then \
echo "Testing $$MOD module..."; \
(cd $$MOD && $(GO_TEST) -race $(COUNT_ARG) -coverprofile=coverage.txt -covermode=atomic $(TEST_ARGS) $(WHAT)); \
(cd $$MOD && $(GO_TEST) -race $(COUNT_ARG) -coverprofile=coverage.txt -covermode=atomic $(TEST_ARGS) $$(go list "$(WHAT)" | grep -v 'test/load/testing')); \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? test/load is its own go module So I'd think it enters test/load and then runs go test -race ...
It's just that this looks very hacky =/

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that go test -race would run the load tests themselves, which cannot (and should not) be executed for unit testing

fi; \
done

Expand Down
125 changes: 125 additions & 0 deletions test/load/Proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Load Testing

## 10000 workspaces reference architecture

Assumptions & Requirements:

* We want to test how a kcp installation with 10000 workspaces behaves on synthetic workloads
* We do not want to test how easily kcp handles adding 10000 workspaces at once
* We don't want to run etcd on machines which are hosting kcp
* We will be using kcp-operator to setup a sharded kcp instance
* The minimum amount of replicas for any component is 3
* The loadtests are infrastructure provider agnostic. This will allow us and community members to
experiment with different infrastructure sizes
* We treat the rootshard like we would any other shard. It will be filled with regular workspaces
So shard1 = rootshard
* As the kcp-operator currently has no support for a dedicated cache server: We have decided to still
work with the default model of having an embedded cache in the rootshard (even if it overloads the
rootshard). Specifically this means when we have 3 shards, we create 1 Rootshard and 2 Shards
* Results are stored in some permanent storage so we can use them for comparison later

### Architecture

[drawing of the general layout](./architecture.excalidraw)

### Node calculation

All node calculations are based on the number of workspaces and use the following recommended constants:

* max_workspaces_per_shard = 3500
* min_replicas = 3
* kcp_server_buffer = 512MB
* #kcp_cache_nodes = 1
* #aux_nodes = 1
* #frontproxy_nodes = 3
* mem_per_workspace = 5MB

---

1. We calculate the number of shards

```txt
#shards = round_up(#workspaces / max_workspaces_per_shard)
```

1. Now we can calculate the number of etcd nodes.

```txt
#etcd_nodes = #shards * min_replicas
```

1. We can calculcate the number of shards and their size in relation to the number of workspaces

```txt
#shard_nodes = #shards * min_replicas
#actual_workspaces_per_shard = workspaces / shards
kcp_server_node_mem = kcp_server_buffer + (#actual_workspaces * mem_per_workspace)
```

The total number of all required nodes is calculated as follows

```txt
#total_nodes = #frontproxy_nodes + #kcp_cache_nodes + #etcd_nodes + #kcp_server_nodes + #aux_nodes
```

#### Example for 10000 workspaces

```txt
#shards = 10000 / 3500 = 2,85 = 3
#etcd_nodes = 3 * 3 = 9
#shard_nodes = 3 * 3 = 9
#actual_workspaces_per_shard = 10000 / 3 = 3333
kcp_server_node_mem = 512 + (3333 * 5) = 17777MB
#total_nodes = 3 + 1 + 9 + 3 + 1 = 17
```

### Testing Protocol

Mantra: We want to test how a kcp installation with 10000 workspaces behaves on simulated, regular
activities. We don't want to test how easily we can add 10000 workspaces at once

#### Procedure

1. Create 10000 workspaces, APIExports, etc. and patiently wait for all of them to become ready
2. Simulate real world activity by simulating end-users using custom kubeconfigs to create APIBindings
and then CRUD on their custom api-objects

##### Level 1 - 10000 empty workspaces

We are just going to put 10000 empty workspaces into a kcp installation. We will have a nesting level
setting so we can try out if nesting has any impact (it should not). This test case
is extremely deterministic and has should spread workspaces relatively equally across shards.
We mainly use this as a base consumption measurement and to verify nesting has no performance impact.

##### Level 2 - Basic CRUD

Every workspace has a type and we are going to do a basic parallel CRUD workflow which we will simulate
using simple Kubernetes Jobs. The workflow is done on basic objects from a single provider using a
singular APIExport.

##### Level 3 - Multiple Providers

We are multixplexing the level 2 example to use multiple providers.

##### Outlook

We want to keep the initial version of the tests simple and deterministic. As a result the following
topics have been discussed, but were considered not to be part of the first 3 level implementation:

* direct user interaction via simulated users
* custom workspacetypes with initializers and finalizers
* integrating the init-agent
* nested workspaces living on different shards
* having a chaos monkey randomly killing shards

### Scraping of Metrics

We plan on using a plain Prometheus to scrape all of the kcp-instanes: On a higher level we plan to
monitor:

* CPU + Mem on all components
* Number of Goroutines over time
* Request response times on front-proxy (probably percentiles). This could also alternatively be
measured inside the testing suite (clientside)
* Disk IO and size on both etcd and rootshard
* Total number of workspaces (to compare expected with actual)
129 changes: 22 additions & 107 deletions test/load/Readme.md
Original file line number Diff line number Diff line change
@@ -1,125 +1,40 @@
# Load Testing

## 10000 workspaces reference architecture
Load testing framework and loadtests for the kcp project.

Assumptions & Requirements:
## Architecture

* We want to test how a kcp installation with 10000 workspaces behaves on synthetic workloads
* We do not want to test how easily kcp handles adding 10000 workspaces at once
* We don't want to run etcd on machines which are hosting kcp
* We will be using kcp-operator to setup a sharded kcp instance
* The minimum amount of replicas for any component is 3
* The loadtests are infrastructure provider agnostic. This will allow us and community members to
experiment with different infrastructure sizes
* We treat the rootshard like we would any other shard. It will be filled with regular workspaces
So shard1 = rootshard
* As the kcp-operator currently has no support for a dedicated cache server: We have decided to still
work with the default model of having an embedded cache in the rootshard (even if it overloads the
rootshard). Specifically this means when we have 3 shards, we create 1 Rootshard and 2 Shards
* Results are stored in some permanent storage so we can use them for comparison later
Please refer to the [drawing of the general layout](./architecture.excalidraw).

### Architecture
## Setup

[drawing of the general layout](./architecture.excalidraw)
Installation scripts and manuals are provided in [setup/Readme](./setup/Readme.md).

### Node calculation
## Usage

All node calculations are based on the number of workspaces and use the following recommended constants:
All test cases are organized in the `testing` folder. You can run the entire suite using:

* max_workspaces_per_shard = 3500
* min_replicas = 3
* kcp_server_buffer = 512MB
* #kcp_cache_nodes = 1
* #aux_nodes = 1
* #frontproxy_nodes = 3
* mem_per_workspace = 5MB

---

1. We calculate the number of shards

```txt
#shards = round_up(#workspaces / max_workspaces_per_shard)
```

1. Now we can calculate the number of etcd nodes.

```txt
#etcd_nodes = #shards * min_replicas
```

1. We can calculcate the number of shards and their size in relation to the number of workspaces

```txt
#shard_nodes = #shards * min_replicas
#actual_workspaces_per_shard = workspaces / shards
kcp_server_node_mem = kcp_server_buffer + (#actual_workspaces * mem_per_workspace)
```

The total number of all required nodes is calculated as follows

```txt
#total_nodes = #frontproxy_nodes + #kcp_cache_nodes + #etcd_nodes + #kcp_server_nodes + #aux_nodes
```

#### Example for 10000 workspaces

```txt
#shards = 10000 / 3500 = 2,85 = 3
#etcd_nodes = 3 * 3 = 9
#shard_nodes = 3 * 3 = 9
#actual_workspaces_per_shard = 10000 / 3 = 3333
kcp_server_node_mem = 512 + (3333 * 5) = 17777MB
#total_nodes = 3 + 1 + 9 + 3 + 1 = 17
```sh
go test ./testing/...
```

### Testing Protocol
The tests will prompt you for any specific required variables and configs.

Mantra: We want to test how a kcp installation with 10000 workspaces behaves on simulated, regular
activities. We don't want to test how easily we can add 10000 workspaces at once
Alternatively you can run a subset of tests using standard `go test` syntax. E.g.:

#### Procedure

1. Create 10000 workspaces, APIExports, etc. and patiently wait for all of them to become ready
2. Simulate real world activity by simulating end-users using custom kubeconfigs to create APIBindings
and then CRUD on their custom api-objects

##### Level 1 - 10000 empty workspaces

We are just going to put 10000 empty workspaces into a kcp installation. We will have a nesting level
setting so we can try out if nesting has any impact (it should not). This test case
is extremely deterministic and has should spread workspaces relatively equally across shards.
We mainly use this as a base consumption measurement and to verify nesting has no performance impact.

##### Level 2 - Basic CRUD

Every workspace has a type and we are going to do a basic parallel CRUD workflow which we will simulate
using simple Kubernetes Jobs. The workflow is done on basic objects from a single provider using a
singular APIExport.

##### Level 3 - Multiple Providers

We are multixplexing the level 2 example to use multiple providers.

##### Outlook
```sh
go test ./testing/... -run ^TestExample
```

We want to keep the initial version of the tests simple and deterministic. As a result the following
topics have been discussed, but were considered not to be part of the first 3 level implementation:
## Development

* direct user interaction via simulated users
* custom workspacetypes with initializers and finalizers
* integrating the init-agent
* nested workspaces living on different shards
* having a chaos monkey randomly killing shards
The load-testing framework itself is organized in the `pkg` folder. You can run its unit
tests directly using:

### Scraping of Metrics
```sh
go test ./pkg/...
```

We plan on using a plain Prometheus to scrape all of the kcp-instanes: On a higher level we plan to
monitor:
## Partitioning

* CPU + Mem on all components
* Number of Goroutines over time
* Request response times on front-proxy (probably percentiles). This could also alternatively be
measured inside the testing suite (clientside)
* Disk IO and size on both etcd and rootshard
* Total number of workspaces (to compare expected with actual)
You can partition your loadtest by providing it with a unique `start` number. Please be advised that this multiplexes your test. Any load you place will be multiplied by the number of partitions. Depending per test, adjust throughput values like qps accordingly.
41 changes: 41 additions & 0 deletions test/load/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
module github.com/kcp-dev/kcp/test/load

go 1.25.0

require (
github.com/montanaflynn/stats v0.7.1
github.com/stretchr/testify v1.11.1
k8s.io/apimachinery v0.35.1
k8s.io/client-go v0.35.1
)

require (
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
github.com/fxamacker/cbor/v2 v2.9.0 // indirect
github.com/go-logr/logr v1.4.3 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/kr/pretty v0.3.1 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/spf13/pflag v1.0.10 // indirect
github.com/x448/float16 v0.8.4 // indirect
go.yaml.in/yaml/v2 v2.4.3 // indirect
golang.org/x/net v0.47.0 // indirect
golang.org/x/oauth2 v0.30.0 // indirect
golang.org/x/sys v0.38.0 // indirect
golang.org/x/term v0.37.0 // indirect
golang.org/x/text v0.31.0 // indirect
golang.org/x/time v0.9.0 // indirect
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/klog/v2 v2.140.0 // indirect
k8s.io/kube-openapi v0.0.0-20250910181357-589584f1c912 // indirect
k8s.io/utils v0.0.0-20260210185600-b8788abfbbc2 // indirect
sigs.k8s.io/json v0.0.0-20250730193827-2d320260d730 // indirect
sigs.k8s.io/randfill v1.0.0 // indirect
sigs.k8s.io/structured-merge-diff/v6 v6.3.0 // indirect
sigs.k8s.io/yaml v1.6.0 // indirect
)
Loading