Conversation
e79eb40 to
aed85ef
Compare
|
/test pull-kcp-test-integration |
| } else if err != nil { | ||
| return err | ||
| } | ||
| if len(bytes.TrimSpace(doc)) == 0 { |
There was a problem hiding this comment.
Why move the empty-doc check from here into the for ... := range docs loop below?
There was a problem hiding this comment.
True that is better, at the time I was more focused on the startup in pkg/server so I didn't think twice about it.
|
Otherwise LGTM. |
aed85ef to
b7c5af2
Compare
|
/lgtm |
|
LGTM label has been added. DetailsGit tree hash: 183666035c3ece9c747fd959ecbf556cb1770b17 |
config/helpers/bootstrap.go
Outdated
| if f.IsDir() { | ||
| return | ||
| } | ||
| if err := CreateResourceFromFS(ctx, client, mapper, batteriesIncluded, f.Name(), fs, transformers...); err != nil { |
There was a problem hiding this comment.
I remember some bootstrap files used a file in order. Can we maybe add some high-level ordering? Meaning - namespaces first, cluster object second, and the rest follows?
This lib is heavy used outside kcp in our other repos and I think it might break things if no ordering are done?
There was a problem hiding this comment.
Mhm, we could do weights but I'd rather wrap createResourceFromFS in an eventually so ordering doesn't matter.
So basically CreateResourcesFromFS - multiplex -> CreateResourceFromFS - multiplex -> createRetryResourceFromFS - retry -> createResourceFromFs.
I was already thinking that wrapping CreateResourcesFromFS in an eventually in the startup was a bit too expensive because on every failure it would apply everything again.
There was a problem hiding this comment.
But does this mean if you have 100 files, it will fire 100 create requests, and 99 will fail due to missing namespace, and 1 namespace succeeds, and after that, 99 retries succeed.
vs: 1 namespace request, and 99 following.
Basically, 199 requests vs 100 requests.
There was a problem hiding this comment.
True, but then we should redesign the resource installation; so we weight the resources being installing, group them by weight and install in chunks, with resources in one chunk being parallelised.
I think @xrstf already made a similar weighing for the init-agent, we could reuse that code and go from there.
mjudeikis-bot
left a comment
There was a problem hiding this comment.
Review notes:
1. Data race in CreateResourceFromFS — transformer loop
The goroutine captures err from the outer function scope rather than declaring it locally:
doc, err = transformer(doc) // ← outer err, data race across goroutinesShould be:
doc, err := transformer(doc)Multiple goroutines writing to the same err variable is a data race. go test -race will catch this.
2. Dependency ordering — namespaces, CRDs, resources
Parallelizing within a file and across files changes the semantics when resources have install-time dependencies. For example:
- A multi-doc YAML with a
Namespacefollowed by resources in that namespace — both now land concurrently, so the resource creation may fail withnamespace not found - Across files: CRDs and their CRs in separate files can now race
This is only safe if every call site already wraps CreateResourcesFromFS in a retry loop. Worth auditing call sites — if any are fire-once, this is a silent regression that converts a previously-reliable install into a flaky one.
3. Minor: inconsistent closure style
BindRootAPIs goroutine correctly passes exportName explicitly:
go func(exportName string) { ... }(exportName)The other goroutines rely on Go 1.22+ per-iteration range variable semantics instead. Not wrong if the module requires 1.22+, but worth noting the inconsistency — explicit params are safer and clearer about intent.
|
New changes are detected. LGTM label has been removed. |
49da64b to
0816ec3
Compare
|
/hold I've parallelised installing resources in a few more spaces. Works fine to startup etc.pp. initially but now the batteries kcp-admin can't access workspace types: I'll revisit the changes later. |
|
On it @ntnn — will open a follow-up PR with a utility function in |
08ddecf to
4a82c7d
Compare
Add ForEach[T] to pkg/errgroup as a convenience wrapper over Group that runs a function concurrently for each item in a slice. The exitEarly flag mirrors Group.FailFast: when true, the context passed to fn is cancelled as soon as any invocation returns a non-nil error so callers can abort in-flight work promptly. Suggested-by: ntnn (kcp-dev#3847) Co-authored-by: Mangirdas Judeikis <mangirdas@judeikis.lt>
|
Reviewed the changes. Overall the approach is solid — good separation of concerns across , , and the refactored . A few observations: ** / ** ** capacity hint** ** with filter** ** — no ** Overall LGTM from a logic standpoint — the hierarchy-aware chunking correctly preserves the CRD → APIExport → APIBinding install ordering that was implicit before. |
1bf3bff to
cccd052
Compare
|
/hold cancel Okay, I finally worked out all of the nits that were wrong. I also moved a bunch of code so it's not everything in the one bootstrap.go file. |
cccd052 to
5614b73
Compare
mjudeikis-bot
left a comment
There was a problem hiding this comment.
Thanks for the iteration — the refactoring is in much better shape now. The separation into binds.go, read.go, sort.go, template.go, transform.go, and yaml.go makes the package a lot easier to navigate. Good call extracting TemplateInput as an exported type too.
Previous concern addressed ✅ — The cache invalidation is now scoped to chunk-level failures via the reset function pointer, which is a clear improvement. The comment explaining why it is kept despite the new ordering is appreciated.
A few minor observations on the updated code:
read.go — fail-fast vs. aggregate errors
The old CreateResourcesFromFS used utilerrors.NewAggregate to collect all per-file errors and return them together. ReadResourcesFromFS now returns on the first file error. This is likely fine for bootstrap use-cases but is a subtle behavior change worth noting in case callers relied on seeing all errors at once.
binds.go — unconditional spec overwrite in BindRootAPI
In the AlreadyExists polling loop, existing.Spec = binding.Spec is always applied before calling Update, without first checking whether the spec is already correct. In practice this is harmless (the update becomes a no-op server-side if nothing changed), but it does generate unnecessary write traffic on every restart when the binding is already up-to-date. A quick equality check before the Update call would eliminate the churn.
sort.go — objectWeight partial view
I can see the weights slice but not the full objectWeight implementation in the diff. Assuming it handles the {Group: "apiextensions.k8s.io"} entry (no Kind) as a prefix/group-only match for CRDs — worth confirming that the matching logic does not accidentally over-match other groups.
None of these are blockers. The core parallelisation logic and the hierarchy-based chunking are well-implemented and well-tested. Happy to see this move forward.
There was a problem hiding this comment.
Pull request overview
This PR refactors the config bootstrap/resource installation helpers to reduce cold-start time by parallelizing resource creation and centralizing YAML parsing/templating/transform logic into reusable helper functions.
Changes:
- Extracted YAML parsing, templating, and file transformation logic into new
config/helpers/*helpers. - Introduced hierarchy-based sorting/chunking of unstructured objects to apply foundational resources earlier (CRDs, APIs, namespaces, etc.).
- Updated bootstrap/apply logic to process chunks sequentially while creating resources within a chunk concurrently; added unit tests for sorting/chunking.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| config/helpers/bootstrap.go | Refactors bootstrap/apply to read + chunk resources and create objects concurrently within each chunk. |
| config/helpers/read.go | Adds helpers to read, transform, template, and filter resources from an embedded FS. |
| config/helpers/yaml.go | Adds YAML-to-unstructured parsing helper for multi-doc YAML. |
| config/helpers/template.go | Adds manifest templating helper and template input construction. |
| config/helpers/transform.go | Extracts transform option types + replace transformer and applies transformer pipeline. |
| config/helpers/sort.go | Adds object hierarchy weighting, sorting, and chunking. |
| config/helpers/sort_test.go | Adds tests for sorting and chunking behavior. |
| config/helpers/binds.go | Moves root API binding helpers into a dedicated file. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
config/helpers/read.go
Outdated
| return raw, nil | ||
| } | ||
|
|
||
| // ReadResourceFromFS uses ReadRawFromFS to a read a file and returns it |
|
@mjudeikis-bot Apply each commit in this PR and run the lint and e2e tests every time to verify that each commit is atomic. |
|
Took a look at the updated changes — the refactoring into Re: Copilot loop-variable capture warnings — these are false positives. The project requires Go ≥ 1.25 ( CI failures — the integration and e2e suites are all red on the latest commit ( Minor: the comment in |
|
@ntnn @xrstf — picking up the mentions here. Re: utility function request — I don't have direct push access to kcp-dev/kcp, so I can't open a PR from here. I'll flag this for @mjudeikis to decide whether to proceed via a fork. The function would live in something like func RunParallel(ctx context.Context, exitEarly bool, fns ...func(ctx context.Context) error) error {
ctx, cancel := context.WithCancel(ctx)
defer cancel()
errsChan := make(chan error, len(fns))
var wg sync.WaitGroup
for _, fn := range fns {
wg.Add(1)
go func(fn func(ctx context.Context) error) {
defer wg.Done()
err := fn(ctx)
if err != nil && exitEarly {
cancel()
}
errsChan <- err
}(fn)
}
wg.Wait()
close(errsChan)
var errs []error
for err := range errsChan {
if err != nil {
errs = append(errs, err)
}
}
return utilerrors.NewAggregate(errs)
}Will proceed once @mjudeikis gives the go-ahead. Re: Harlem Shake request — 😄 Nice try @xrstf, but I'm a coding bot, not a DJ. |
|
@ntnn — running per-commit lint and e2e test verification is in the coder agent (Bob ⚙️) territory. I handle code review and comments only. Will flag this for @mjudeikis to decide if the coder agent should be tasked with the per-commit atomicity check. |
|
/test pull-kcp-test-e2e-shared That looks like flakes. |
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
6908961 to
0d4485e
Compare
|
@ntnn: The following tests failed, say
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
I've been fiddling with the startup because 30s cold start for an instance is a bit much for my liking.
This cuts a bit off - not a lot though. On average parallelizing this cuts down ~5s startup time.
Other functions are already parallelizing in the same way, so it's more of a cleanup to adhere to standards set in other functions.
What Type of PR Is This?
/kind cleanup
Related Issue(s)
Fixes #
Release Notes