Add resource limits to all overlays with CI enforcement and documentation by tomncooper · Pull Request #12 · streamshub/developer-quickstart

tomncooper · 2026-04-10T14:04:49Z

Users currently have no way of knowing how much CPU and memory a given overlay requires before deploying it. This makes it difficult to right-size clusters, and deploying without sufficient resources leads to pods stuck in Pending with no clear explanation.

This PR addresses that by:

Setting resource requests and limits on every component: all operator deployments and custom resources across the core and metrics overlays now have explicit CPU and memory specs, with requests equal to limits.
Documenting per-overlay resource totals: each overlay's documentation page now declares the total CPU and memory required in its frontmatter (cpu_total, memory_total), rendered in a "Resource Requirements" section so users can check before installing.
Adding CI scripts to enforce this going forward:

VerifyResourceLimits: walks the kustomize build output and CRD schemas to check that every container and every configured CR resource field has both requests and limits set. Optional CRD fields that aren't configured are skipped.
VerifyDocumentedResources: sums the actual resource requests from kustomize build and checks that the documented cpu_total and memory_total in the overlay's doc page are sufficient.
Both scripts have unit tests, run in CI on every PR, and auto-discover overlays so new overlays are validated without manual workflow updates.

Adding overlay contributor documentation: docs/overlays/developing.md explains the resource limit and documentation requirements so contributors know what CI will enforce.

The PR is quite large as it combines the actual resource limit updates and the CI checking code, but they are is separate commits (the limits are in the first commit with the CI and docs in the later commits), so they can be reviewed separately. If this is still too much I can look at chopping out the docs and resource checks into individual PRs.

Set CPU and memory requests equal to limits for every operator deployment and custom resource across the core and metrics overlays. * Patch Strimzi, Apicurio Registry, and Console operator deployments via kustomize strategic-merge patches * Set resource specs on Kafka node pools and entity operator (topic + user operator) in the Kafka CR * Set resource specs on Apicurio Registry app and UI containers * Set resource specs on Console API and UI containers * Set resource specs on Prometheus Operator deployment and Prometheus CR Signed-off-by: Thomas Cooper <code@tomcooper.dev>

│ Add CI scripts that verify every container in an overlay has resource │ requests and limits, and that overlay documentation pages declare │ accurate resource totals. Add documentation for the core overlay and │ a guide for overlay contributors. │ │ * Add VerifyResourceLimits script to check all containers and CR │ resource fields have requests and limits set │ * Add VerifyDocumentedResources script to check documented cpu_total │ and memory_total match kustomize build output │ * Add CrdSchemaUtils shared utility for CRD schema introspection │ * Add unit tests for both verification scripts │ * Move existing scritp tests into tests subdirectory │ * Add script-tests.yaml workflow to run script unit tests in CI │ * Add docs/overlays/core.md with install instructions, components │ table, and resource requirements │ * Add docs/overlays/developing.md guide covering resource limit and │ documentation requirements for overlay contributors │ * Add resource requirements frontmatter and section to metrics overlay │ docs │ * Refactor validate.yaml to discover overlays dynamically instead of │ a hardcoded list │ * Update README with new script descriptions and test instruction Signed-off-by: Thomas Cooper <code@tomcooper.dev>

* Update overlay developer docs with more details * Add helper script to show the resource limits set in a given overlay * Set the docs preview script to use the same hugo-book version as the StreamsHub site Signed-off-by: Thomas Cooper <code@tomcooper.dev>

Signed-off-by: Thomas Cooper <code@tomcooper.dev>

The requirement for requests == limits caused total CPU requests (3,650m for metrics overlay) to exceed what a 4-CPU minikube node can allocate after Kubernetes system overhead (~900m), leaving the console deployment stuck Pending with "Insufficient cpu". Lower CPU requests while keeping limits unchanged so pods reserve less for scheduling but can still burst under load: * Console operator: 500m → 100m request, 500m limit * Kafka dual-role: 500m → 250m request, 500m limit * Apicurio registry app: 500m → 250m request, 500m limit * Console API: 500m → 250m request, 500m limit Relax CI invariant from requests == limits to requests <= limits: * Rename checkRequestsEqualsLimits → checkRequestsNotExceedLimits * Update VerifyResourceLimits and VerifyDocumentedResources call sites * Update unit tests to accept requests < limits, reject requests > limits * Update documented cpu_total from 4 to 3 CPU cores for both overlays

tomncooper · 2026-04-10T15:45:10Z

I had to reduce the requests for all resources to get them below the 4 cpu limit of the standard GH Actions runner. We may hit a wall with more complicated overlays and may need a custom testing solution.

tomncooper added 2 commits April 10, 2026 14:52

tomncooper requested review from Frawless and kornys April 10, 2026 14:05

tomncooper added 2 commits April 10, 2026 15:34

Increase smoke test timeouts and minikube resources

d3b553e

Signed-off-by: Thomas Cooper <code@tomcooper.dev>

tomncooper force-pushed the resource-limits branch from 04c6c17 to d3b553e Compare April 10, 2026 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add resource limits to all overlays with CI enforcement and documentation#12

Add resource limits to all overlays with CI enforcement and documentation#12
tomncooper wants to merge 5 commits intomainfrom
resource-limits

tomncooper commented Apr 10, 2026 •

edited

Loading

Uh oh!

tomncooper commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tomncooper commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomncooper commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tomncooper commented Apr 10, 2026 •

edited

Loading