-
Notifications
You must be signed in to change notification settings - Fork 446
Add loadtesting framework #3895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,125 @@ | ||
| # Load Testing | ||
|
|
||
| ## 10000 workspaces reference architecture | ||
|
|
||
| Assumptions & Requirements: | ||
|
|
||
| * We want to test how a kcp installation with 10000 workspaces behaves on synthetic workloads | ||
| * We do not want to test how easily kcp handles adding 10000 workspaces at once | ||
| * We don't want to run etcd on machines which are hosting kcp | ||
| * We will be using kcp-operator to setup a sharded kcp instance | ||
| * The minimum amount of replicas for any component is 3 | ||
| * The loadtests are infrastructure provider agnostic. This will allow us and community members to | ||
| experiment with different infrastructure sizes | ||
| * We treat the rootshard like we would any other shard. It will be filled with regular workspaces | ||
| So shard1 = rootshard | ||
| * As the kcp-operator currently has no support for a dedicated cache server: We have decided to still | ||
| work with the default model of having an embedded cache in the rootshard (even if it overloads the | ||
| rootshard). Specifically this means when we have 3 shards, we create 1 Rootshard and 2 Shards | ||
| * Results are stored in some permanent storage so we can use them for comparison later | ||
|
|
||
| ### Architecture | ||
|
|
||
| [drawing of the general layout](./architecture.excalidraw) | ||
|
|
||
| ### Node calculation | ||
|
|
||
| All node calculations are based on the number of workspaces and use the following recommended constants: | ||
|
|
||
| * max_workspaces_per_shard = 3500 | ||
| * min_replicas = 3 | ||
| * kcp_server_buffer = 512MB | ||
| * #kcp_cache_nodes = 1 | ||
| * #aux_nodes = 1 | ||
| * #frontproxy_nodes = 3 | ||
| * mem_per_workspace = 5MB | ||
|
|
||
| --- | ||
|
|
||
| 1. We calculate the number of shards | ||
|
|
||
| ```txt | ||
| #shards = round_up(#workspaces / max_workspaces_per_shard) | ||
| ``` | ||
|
|
||
| 1. Now we can calculate the number of etcd nodes. | ||
|
|
||
| ```txt | ||
| #etcd_nodes = #shards * min_replicas | ||
| ``` | ||
|
|
||
| 1. We can calculcate the number of shards and their size in relation to the number of workspaces | ||
|
|
||
| ```txt | ||
| #shard_nodes = #shards * min_replicas | ||
| #actual_workspaces_per_shard = workspaces / shards | ||
| kcp_server_node_mem = kcp_server_buffer + (#actual_workspaces * mem_per_workspace) | ||
| ``` | ||
|
|
||
| The total number of all required nodes is calculated as follows | ||
|
|
||
| ```txt | ||
| #total_nodes = #frontproxy_nodes + #kcp_cache_nodes + #etcd_nodes + #kcp_server_nodes + #aux_nodes | ||
| ``` | ||
|
|
||
| #### Example for 10000 workspaces | ||
|
|
||
| ```txt | ||
| #shards = 10000 / 3500 = 2,85 = 3 | ||
| #etcd_nodes = 3 * 3 = 9 | ||
| #shard_nodes = 3 * 3 = 9 | ||
| #actual_workspaces_per_shard = 10000 / 3 = 3333 | ||
| kcp_server_node_mem = 512 + (3333 * 5) = 17777MB | ||
| #total_nodes = 3 + 1 + 9 + 3 + 1 = 17 | ||
| ``` | ||
|
|
||
| ### Testing Protocol | ||
|
|
||
| Mantra: We want to test how a kcp installation with 10000 workspaces behaves on simulated, regular | ||
| activities. We don't want to test how easily we can add 10000 workspaces at once | ||
|
|
||
| #### Procedure | ||
|
|
||
| 1. Create 10000 workspaces, APIExports, etc. and patiently wait for all of them to become ready | ||
| 2. Simulate real world activity by simulating end-users using custom kubeconfigs to create APIBindings | ||
| and then CRUD on their custom api-objects | ||
|
|
||
| ##### Level 1 - 10000 empty workspaces | ||
|
|
||
| We are just going to put 10000 empty workspaces into a kcp installation. We will have a nesting level | ||
| setting so we can try out if nesting has any impact (it should not). This test case | ||
| is extremely deterministic and has should spread workspaces relatively equally across shards. | ||
| We mainly use this as a base consumption measurement and to verify nesting has no performance impact. | ||
|
|
||
| ##### Level 2 - Basic CRUD | ||
|
|
||
| Every workspace has a type and we are going to do a basic parallel CRUD workflow which we will simulate | ||
| using simple Kubernetes Jobs. The workflow is done on basic objects from a single provider using a | ||
| singular APIExport. | ||
|
|
||
| ##### Level 3 - Multiple Providers | ||
|
|
||
| We are multixplexing the level 2 example to use multiple providers. | ||
|
|
||
| ##### Outlook | ||
|
|
||
| We want to keep the initial version of the tests simple and deterministic. As a result the following | ||
| topics have been discussed, but were considered not to be part of the first 3 level implementation: | ||
|
|
||
| * direct user interaction via simulated users | ||
| * custom workspacetypes with initializers and finalizers | ||
| * integrating the init-agent | ||
| * nested workspaces living on different shards | ||
| * having a chaos monkey randomly killing shards | ||
|
|
||
| ### Scraping of Metrics | ||
|
|
||
| We plan on using a plain Prometheus to scrape all of the kcp-instanes: On a higher level we plan to | ||
| monitor: | ||
|
|
||
| * CPU + Mem on all components | ||
| * Number of Goroutines over time | ||
| * Request response times on front-proxy (probably percentiles). This could also alternatively be | ||
| measured inside the testing suite (clientside) | ||
| * Disk IO and size on both etcd and rootshard | ||
| * Total number of workspaces (to compare expected with actual) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,125 +1,40 @@ | ||
| # Load Testing | ||
|
|
||
| ## 10000 workspaces reference architecture | ||
| Load testing framework and loadtests for the kcp project. | ||
|
|
||
| Assumptions & Requirements: | ||
| ## Architecture | ||
|
|
||
| * We want to test how a kcp installation with 10000 workspaces behaves on synthetic workloads | ||
| * We do not want to test how easily kcp handles adding 10000 workspaces at once | ||
| * We don't want to run etcd on machines which are hosting kcp | ||
| * We will be using kcp-operator to setup a sharded kcp instance | ||
| * The minimum amount of replicas for any component is 3 | ||
| * The loadtests are infrastructure provider agnostic. This will allow us and community members to | ||
| experiment with different infrastructure sizes | ||
| * We treat the rootshard like we would any other shard. It will be filled with regular workspaces | ||
| So shard1 = rootshard | ||
| * As the kcp-operator currently has no support for a dedicated cache server: We have decided to still | ||
| work with the default model of having an embedded cache in the rootshard (even if it overloads the | ||
| rootshard). Specifically this means when we have 3 shards, we create 1 Rootshard and 2 Shards | ||
| * Results are stored in some permanent storage so we can use them for comparison later | ||
| Please refer to the [drawing of the general layout](./architecture.excalidraw). | ||
|
|
||
| ### Architecture | ||
| ## Setup | ||
|
|
||
| [drawing of the general layout](./architecture.excalidraw) | ||
| Installation scripts and manuals are provided in [setup/Readme](./setup/Readme.md). | ||
|
|
||
| ### Node calculation | ||
| ## Usage | ||
|
|
||
| All node calculations are based on the number of workspaces and use the following recommended constants: | ||
| All test cases are organized in the `testing` folder. You can run the entire suite using: | ||
|
|
||
| * max_workspaces_per_shard = 3500 | ||
| * min_replicas = 3 | ||
| * kcp_server_buffer = 512MB | ||
| * #kcp_cache_nodes = 1 | ||
| * #aux_nodes = 1 | ||
| * #frontproxy_nodes = 3 | ||
| * mem_per_workspace = 5MB | ||
|
|
||
| --- | ||
|
|
||
| 1. We calculate the number of shards | ||
|
|
||
| ```txt | ||
| #shards = round_up(#workspaces / max_workspaces_per_shard) | ||
| ``` | ||
|
|
||
| 1. Now we can calculate the number of etcd nodes. | ||
|
|
||
| ```txt | ||
| #etcd_nodes = #shards * min_replicas | ||
| ``` | ||
|
|
||
| 1. We can calculcate the number of shards and their size in relation to the number of workspaces | ||
|
|
||
| ```txt | ||
| #shard_nodes = #shards * min_replicas | ||
| #actual_workspaces_per_shard = workspaces / shards | ||
| kcp_server_node_mem = kcp_server_buffer + (#actual_workspaces * mem_per_workspace) | ||
| ``` | ||
|
|
||
| The total number of all required nodes is calculated as follows | ||
|
|
||
| ```txt | ||
| #total_nodes = #frontproxy_nodes + #kcp_cache_nodes + #etcd_nodes + #kcp_server_nodes + #aux_nodes | ||
| ``` | ||
|
|
||
| #### Example for 10000 workspaces | ||
|
|
||
| ```txt | ||
| #shards = 10000 / 3500 = 2,85 = 3 | ||
| #etcd_nodes = 3 * 3 = 9 | ||
| #shard_nodes = 3 * 3 = 9 | ||
| #actual_workspaces_per_shard = 10000 / 3 = 3333 | ||
| kcp_server_node_mem = 512 + (3333 * 5) = 17777MB | ||
| #total_nodes = 3 + 1 + 9 + 3 + 1 = 17 | ||
| ```sh | ||
| go test ./testing/... | ||
| ``` | ||
|
|
||
| ### Testing Protocol | ||
| The tests will prompt you for any specific required variables and configs. | ||
|
|
||
| Mantra: We want to test how a kcp installation with 10000 workspaces behaves on simulated, regular | ||
| activities. We don't want to test how easily we can add 10000 workspaces at once | ||
| Alternatively you can run a subset of tests using standard `go test` syntax. E.g.: | ||
|
|
||
| #### Procedure | ||
|
|
||
| 1. Create 10000 workspaces, APIExports, etc. and patiently wait for all of them to become ready | ||
| 2. Simulate real world activity by simulating end-users using custom kubeconfigs to create APIBindings | ||
| and then CRUD on their custom api-objects | ||
|
|
||
| ##### Level 1 - 10000 empty workspaces | ||
|
|
||
| We are just going to put 10000 empty workspaces into a kcp installation. We will have a nesting level | ||
| setting so we can try out if nesting has any impact (it should not). This test case | ||
| is extremely deterministic and has should spread workspaces relatively equally across shards. | ||
| We mainly use this as a base consumption measurement and to verify nesting has no performance impact. | ||
|
|
||
| ##### Level 2 - Basic CRUD | ||
|
|
||
| Every workspace has a type and we are going to do a basic parallel CRUD workflow which we will simulate | ||
| using simple Kubernetes Jobs. The workflow is done on basic objects from a single provider using a | ||
| singular APIExport. | ||
|
|
||
| ##### Level 3 - Multiple Providers | ||
|
|
||
| We are multixplexing the level 2 example to use multiple providers. | ||
|
|
||
| ##### Outlook | ||
| ```sh | ||
| go test ./testing/... -run ^TestExample | ||
| ``` | ||
|
|
||
| We want to keep the initial version of the tests simple and deterministic. As a result the following | ||
| topics have been discussed, but were considered not to be part of the first 3 level implementation: | ||
| ## Development | ||
|
|
||
| * direct user interaction via simulated users | ||
| * custom workspacetypes with initializers and finalizers | ||
| * integrating the init-agent | ||
| * nested workspaces living on different shards | ||
| * having a chaos monkey randomly killing shards | ||
| The load-testing framework itself is organized in the `pkg` folder. You can run its unit | ||
| tests directly using: | ||
|
|
||
| ### Scraping of Metrics | ||
| ```sh | ||
| go test ./pkg/... | ||
| ``` | ||
|
|
||
| We plan on using a plain Prometheus to scrape all of the kcp-instanes: On a higher level we plan to | ||
| monitor: | ||
| ## Partitioning | ||
|
|
||
| * CPU + Mem on all components | ||
| * Number of Goroutines over time | ||
| * Request response times on front-proxy (probably percentiles). This could also alternatively be | ||
| measured inside the testing suite (clientside) | ||
| * Disk IO and size on both etcd and rootshard | ||
| * Total number of workspaces (to compare expected with actual) | ||
| You can partition your loadtest by providing it with a unique `start` number. Please be advised that this multiplexes your test. Any load you place will be multiplied by the number of partitions. Depending per test, adjust throughput values like qps accordingly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| module github.com/kcp-dev/kcp/test/load | ||
|
|
||
| go 1.25.0 | ||
|
|
||
| require ( | ||
| github.com/montanaflynn/stats v0.7.1 | ||
| github.com/stretchr/testify v1.11.1 | ||
| k8s.io/apimachinery v0.35.1 | ||
| k8s.io/client-go v0.35.1 | ||
| ) | ||
|
|
||
| require ( | ||
| github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect | ||
| github.com/fxamacker/cbor/v2 v2.9.0 // indirect | ||
| github.com/go-logr/logr v1.4.3 // indirect | ||
| github.com/json-iterator/go v1.1.12 // indirect | ||
| github.com/kr/pretty v0.3.1 // indirect | ||
| github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect | ||
| github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee // indirect | ||
| github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect | ||
| github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect | ||
| github.com/spf13/pflag v1.0.10 // indirect | ||
| github.com/x448/float16 v0.8.4 // indirect | ||
| go.yaml.in/yaml/v2 v2.4.3 // indirect | ||
| golang.org/x/net v0.47.0 // indirect | ||
| golang.org/x/oauth2 v0.30.0 // indirect | ||
| golang.org/x/sys v0.38.0 // indirect | ||
| golang.org/x/term v0.37.0 // indirect | ||
| golang.org/x/text v0.31.0 // indirect | ||
| golang.org/x/time v0.9.0 // indirect | ||
| gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect | ||
| gopkg.in/inf.v0 v0.9.1 // indirect | ||
| gopkg.in/yaml.v3 v3.0.1 // indirect | ||
| k8s.io/klog/v2 v2.140.0 // indirect | ||
| k8s.io/kube-openapi v0.0.0-20250910181357-589584f1c912 // indirect | ||
| k8s.io/utils v0.0.0-20260210185600-b8788abfbbc2 // indirect | ||
| sigs.k8s.io/json v0.0.0-20250730193827-2d320260d730 // indirect | ||
| sigs.k8s.io/randfill v1.0.0 // indirect | ||
| sigs.k8s.io/structured-merge-diff/v6 v6.3.0 // indirect | ||
| sigs.k8s.io/yaml v1.6.0 // indirect | ||
| ) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this?
test/loadis its own go module So I'd think it enterstest/loadand then runsgo test -race ...It's just that this looks very hacky =/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is that go test -race would run the load tests themselves, which cannot (and should not) be executed for unit testing