Skip to content

docs(plan): design for live-deployment example validation CI matrix#726

Closed
intel352 wants to merge 2 commits into
mainfrom
fix/remote-plugin-strict-proto
Closed

docs(plan): design for live-deployment example validation CI matrix#726
intel352 wants to merge 2 commits into
mainfrom
fix/remote-plugin-strict-proto

Conversation

@intel352
Copy link
Copy Markdown
Contributor

Design doc for the live-deploy CI matrix deferred from the 2026-05-19 QoL sweep. Closes #723. Execution is operator-gated (staging accounts + OIDC trust).

intel352 and others added 2 commits May 18, 2026 09:35
Remote steps must not send host-only HTTP metadata or engine internal config keys through strict protobuf plugin contracts.
Files a design doc for the live-deploy CI matrix deferred from the
2026-05-19 multi-repo QoL sweep. Schema-level validation is insufficient
to promote a plugin to 'verified'; this design adds a weekly OIDC-driven
GitHub Actions matrix that exercises each IaC plugin's
examples/minimal/config.yaml against staging cloud accounts, auto-promotes
on green, demotes on 2 consecutive REDs.

Execution is gated on operator provisioning staging accounts + GitHub
OIDC trust per provider. Document this as the next concrete step.

Companion to workflow#725 (marketplace-verify subcommand). Closes #723.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 19, 2026 18:47
@intel352
Copy link
Copy Markdown
Contributor Author

Botched: committed under _worktrees/ subdir of main repo on wrong branch. Redoing on correct worktree.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a design document for a live-deployment CI validation matrix (issue #723), and includes a small RemoteStep robustness change to prevent request encoding failures when PipelineContext metadata contains non-Struct-serializable values.

Changes:

  • Filter RemoteStep metadata before encoding to google.protobuf.Struct to avoid failures on unrepresentable metadata values.
  • Strip engine-internal "_*" keys from STRICT_PROTO RemoteStep typed config before protojson encoding.
  • Add a design doc describing the proposed operator-gated live-deploy validation CI matrix and promotion/demotion flow.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
plugin/external/remote_step.go Filters metadata for Struct encoding; strips internal config keys prior to STRICT_PROTO typed encoding.
plugin/external/remote_step_test.go Adds tests covering metadata filtering and STRICT_PROTO internal-key stripping.
_worktrees/live-deploy-design-1779216240/docs/plans/2026-05-19-live-deploy-validation-design.md New design doc for the live-deploy validation CI matrix (but currently located under a likely unintended worktree directory).

Comment on lines 111 to 116
if err != nil {
return nil, fmt.Errorf("remote step %q (handle %s) encode trigger_data as Struct: %w", s.name, s.handleID, err)
}
metadata, err := mapToStruct(pc.Metadata)
metadata, err := mapToStruct(remotePluginMetadata(pc.Metadata))
if err != nil {
return nil, fmt.Errorf("remote step %q (handle %s) encode metadata as Struct: %w", s.name, s.handleID, err)
Comment on lines +180 to +191
func remotePluginMetadata(metadata map[string]any) map[string]any {
if metadata == nil {
return nil
}
filtered := make(map[string]any, len(metadata))
for key, value := range metadata {
if _, err := structpb.NewValue(value); err != nil {
continue
}
filtered[key] = value
}
return filtered
Comment on lines +1 to +5
# Live-Deployment Example Validation — Design

**Date:** 2026-05-19
**Trigger:** The 2026-05-19 multi-repo QoL sweep validated plugin examples at SCHEMA level (`wfctl validate --skip-unknown-types`) but never ran them end-to-end against real cloud accounts. Promotion from `experimental` to `verified` remains a manual decision tied to GoCodeAlone-internal usage.
**Mode:** Design only (operator must provision CI secrets before execution).
**Date:** 2026-05-19
**Trigger:** The 2026-05-19 multi-repo QoL sweep validated plugin examples at SCHEMA level (`wfctl validate --skip-unknown-types`) but never ran them end-to-end against real cloud accounts. Promotion from `experimental` to `verified` remains a manual decision tied to GoCodeAlone-internal usage.
**Mode:** Design only (operator must provision CI secrets before execution).

@github-actions
Copy link
Copy Markdown

⏱ Benchmark Results

No significant performance regressions detected.

benchstat comparison (baseline → PR)
## benchstat: baseline → PR
baseline-bench.txt:274: parsing iteration count: invalid syntax
baseline-bench.txt:349214: parsing iteration count: invalid syntax
baseline-bench.txt:637566: parsing iteration count: invalid syntax
baseline-bench.txt:918025: parsing iteration count: invalid syntax
baseline-bench.txt:1219277: parsing iteration count: invalid syntax
baseline-bench.txt:1514166: parsing iteration count: invalid syntax
benchmark-results.txt:274: parsing iteration count: invalid syntax
benchmark-results.txt:263018: parsing iteration count: invalid syntax
benchmark-results.txt:560523: parsing iteration count: invalid syntax
benchmark-results.txt:809743: parsing iteration count: invalid syntax
benchmark-results.txt:1336814: parsing iteration count: invalid syntax
benchmark-results.txt:1633056: parsing iteration count: invalid syntax
goos: linux
goarch: amd64
pkg: github.com/GoCodeAlone/workflow/dynamic
cpu: AMD EPYC 7763 64-Core Processor                
                            │ baseline-bench.txt │       benchmark-results.txt        │
                            │       sec/op       │    sec/op     vs base              │
InterpreterCreation-4              10.361m ± 70%   8.237m ± 62%       ~ (p=0.394 n=6)
ComponentLoad-4                     3.659m ±  1%   3.622m ±  5%       ~ (p=0.394 n=6)
ComponentExecute-4                  1.964µ ±  1%   1.955µ ±  1%       ~ (p=0.426 n=6)
PoolContention/workers-1-4          1.092µ ±  3%   1.089µ ±  4%       ~ (p=0.667 n=6)
PoolContention/workers-2-4          1.080µ ±  1%   1.107µ ±  1%  +2.55% (p=0.004 n=6)
PoolContention/workers-4-4          1.093µ ±  1%   1.109µ ±  1%  +1.42% (p=0.002 n=6)
PoolContention/workers-8-4          1.091µ ±  2%   1.102µ ±  1%  +1.01% (p=0.048 n=6)
PoolContention/workers-16-4         1.094µ ±  4%   1.129µ ±  2%       ~ (p=0.063 n=6)
ComponentLifecycle-4                3.649m ±  1%   3.778m ±  1%  +3.53% (p=0.002 n=6)
SourceValidation-4                  2.334µ ±  1%   2.361µ ±  1%  +1.14% (p=0.017 n=6)
RegistryConcurrent-4                796.9n ±  4%   792.7n ±  4%       ~ (p=0.909 n=6)
LoaderLoadFromString-4              3.653m ±  0%   3.697m ±  2%       ~ (p=0.065 n=6)
geomean                             19.40µ         19.21µ        -0.95%

                            │ baseline-bench.txt │        benchmark-results.txt         │
                            │        B/op        │     B/op      vs base                │
InterpreterCreation-4               2.027Mi ± 0%   2.027Mi ± 0%       ~ (p=1.000 n=6)
ComponentLoad-4                     2.180Mi ± 0%   2.180Mi ± 0%       ~ (p=0.310 n=6)
ComponentExecute-4                  1.203Ki ± 0%   1.203Ki ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-1-4          1.203Ki ± 0%   1.203Ki ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-2-4          1.203Ki ± 0%   1.203Ki ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-4-4          1.203Ki ± 0%   1.203Ki ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-8-4          1.203Ki ± 0%   1.203Ki ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-16-4         1.203Ki ± 0%   1.203Ki ± 0%       ~ (p=1.000 n=6) ¹
ComponentLifecycle-4                2.183Mi ± 0%   2.183Mi ± 0%       ~ (p=0.188 n=6)
SourceValidation-4                  1.984Ki ± 0%   1.984Ki ± 0%       ~ (p=1.000 n=6) ¹
RegistryConcurrent-4                1.133Ki ± 0%   1.133Ki ± 0%       ~ (p=1.000 n=6) ¹
LoaderLoadFromString-4              2.182Mi ± 0%   2.182Mi ± 0%       ~ (p=0.420 n=6)
geomean                             15.25Ki        15.25Ki       -0.00%
¹ all samples are equal

                            │ baseline-bench.txt │        benchmark-results.txt        │
                            │     allocs/op      │  allocs/op   vs base                │
InterpreterCreation-4                15.68k ± 0%   15.68k ± 0%       ~ (p=1.000 n=6)
ComponentLoad-4                      18.02k ± 0%   18.02k ± 0%       ~ (p=1.000 n=6)
ComponentExecute-4                    25.00 ± 0%    25.00 ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-1-4            25.00 ± 0%    25.00 ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-2-4            25.00 ± 0%    25.00 ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-4-4            25.00 ± 0%    25.00 ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-8-4            25.00 ± 0%    25.00 ± 0%       ~ (p=1.000 n=6) ¹
PoolContention/workers-16-4           25.00 ± 0%    25.00 ± 0%       ~ (p=1.000 n=6) ¹
ComponentLifecycle-4                 18.07k ± 0%   18.07k ± 0%       ~ (p=1.000 n=6) ¹
SourceValidation-4                    32.00 ± 0%    32.00 ± 0%       ~ (p=1.000 n=6) ¹
RegistryConcurrent-4                  2.000 ± 0%    2.000 ± 0%       ~ (p=1.000 n=6) ¹
LoaderLoadFromString-4               18.06k ± 0%   18.06k ± 0%       ~ (p=1.000 n=6) ¹
geomean                               183.3         183.3       +0.00%
¹ all samples are equal

pkg: github.com/GoCodeAlone/workflow/middleware
                                  │ baseline-bench.txt │       benchmark-results.txt       │
                                  │       sec/op       │   sec/op     vs base              │
CircuitBreakerDetection-4                  287.9n ± 0%   287.6n ± 7%       ~ (p=0.848 n=6)
CircuitBreakerExecution_Success-4          21.43n ± 1%   21.39n ± 1%       ~ (p=0.567 n=6)
CircuitBreakerExecution_Failure-4          66.22n ± 0%   66.27n ± 0%       ~ (p=0.255 n=6)
geomean                                    74.21n        74.14n       -0.09%

                                  │ baseline-bench.txt │       benchmark-results.txt        │
                                  │        B/op        │    B/op     vs base                │
CircuitBreakerDetection-4                 144.0 ± 0%     144.0 ± 0%       ~ (p=1.000 n=6) ¹
CircuitBreakerExecution_Success-4         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
CircuitBreakerExecution_Failure-4         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                              ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                  │ baseline-bench.txt │       benchmark-results.txt        │
                                  │     allocs/op      │ allocs/op   vs base                │
CircuitBreakerDetection-4                 1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=6) ¹
CircuitBreakerExecution_Success-4         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
CircuitBreakerExecution_Failure-4         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                              ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

pkg: github.com/GoCodeAlone/workflow/module
                                 │ baseline-bench.txt │       benchmark-results.txt        │
                                 │       sec/op       │    sec/op     vs base              │
IaCStateBackend_InProcess-4              309.2n ± 30%   310.1n ±  4%       ~ (p=0.818 n=6)
IaCStateBackend_GRPC-4                   9.735m ±  2%   9.758m ± 23%       ~ (p=0.310 n=6)
JQTransform_Simple-4                     680.2n ± 44%   719.4n ± 32%       ~ (p=0.485 n=6)
JQTransform_ObjectConstruction-4         1.655µ ±  1%   1.552µ ±  2%  -6.20% (p=0.002 n=6)
JQTransform_ArraySelect-4                3.640µ ±  1%   3.582µ ±  3%       ~ (p=0.056 n=6)
JQTransform_Complex-4                    40.38µ ±  4%   40.60µ ±  1%       ~ (p=0.394 n=6)
JQTransform_Throughput-4                 2.048µ ±  2%   1.906µ ±  2%  -6.94% (p=0.002 n=6)
SSEPublishDelivery-4                     63.62n ±  0%   63.62n ±  1%       ~ (p=0.558 n=6)
geomean                                  3.995µ         3.952µ        -1.06%

                                 │ baseline-bench.txt │         benchmark-results.txt         │
                                 │        B/op        │     B/op       vs base                │
IaCStateBackend_InProcess-4              416.0 ± 0%       416.0 ±  0%       ~ (p=1.000 n=6) ¹
IaCStateBackend_GRPC-4                 5.906Mi ± 9%     5.817Mi ± 12%       ~ (p=0.589 n=6)
JQTransform_Simple-4                   1.273Ki ± 0%     1.273Ki ±  0%       ~ (p=1.000 n=6) ¹
JQTransform_ObjectConstruction-4       1.773Ki ± 0%     1.773Ki ±  0%       ~ (p=1.000 n=6) ¹
JQTransform_ArraySelect-4              2.625Ki ± 0%     2.625Ki ±  0%       ~ (p=1.000 n=6) ¹
JQTransform_Complex-4                  16.22Ki ± 0%     16.22Ki ±  0%       ~ (p=1.000 n=6) ¹
JQTransform_Throughput-4               1.984Ki ± 0%     1.984Ki ±  0%       ~ (p=1.000 n=6) ¹
SSEPublishDelivery-4                     0.000 ± 0%       0.000 ±  0%       ~ (p=1.000 n=6) ¹
geomean                                             ²                  -0.19%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                 │ baseline-bench.txt │        benchmark-results.txt        │
                                 │     allocs/op      │  allocs/op   vs base                │
IaCStateBackend_InProcess-4              2.000 ± 0%      2.000 ± 0%       ~ (p=1.000 n=6) ¹
IaCStateBackend_GRPC-4                  6.836k ± 0%     6.841k ± 0%       ~ (p=0.461 n=6)
JQTransform_Simple-4                     10.00 ± 0%      10.00 ± 0%       ~ (p=1.000 n=6) ¹
JQTransform_ObjectConstruction-4         15.00 ± 0%      15.00 ± 0%       ~ (p=1.000 n=6) ¹
JQTransform_ArraySelect-4                30.00 ± 0%      30.00 ± 0%       ~ (p=1.000 n=6) ¹
JQTransform_Complex-4                    324.0 ± 0%      324.0 ± 0%       ~ (p=1.000 n=6) ¹
JQTransform_Throughput-4                 17.00 ± 0%      17.00 ± 0%       ~ (p=1.000 n=6) ¹
SSEPublishDelivery-4                     0.000 ± 0%      0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                             ²                +0.01%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

pkg: github.com/GoCodeAlone/workflow/schema
                                    │ baseline-bench.txt │       benchmark-results.txt        │
                                    │       sec/op       │    sec/op     vs base              │
SchemaValidation_Simple-4                    1.116µ ± 7%   1.103µ ±  6%       ~ (p=0.589 n=6)
SchemaValidation_AllFields-4                 1.686µ ± 5%   1.669µ ± 21%       ~ (p=0.310 n=6)
SchemaValidation_FormatValidation-4          1.578µ ± 1%   1.582µ ±  2%       ~ (p=0.329 n=6)
SchemaValidation_ManySchemas-4               1.873µ ± 4%   1.834µ ±  2%       ~ (p=0.071 n=6)
geomean                                      1.535µ        1.520µ        -1.00%

                                    │ baseline-bench.txt │       benchmark-results.txt        │
                                    │        B/op        │    B/op     vs base                │
SchemaValidation_Simple-4                   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
SchemaValidation_AllFields-4                0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
SchemaValidation_FormatValidation-4         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
SchemaValidation_ManySchemas-4              0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                                ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                    │ baseline-bench.txt │       benchmark-results.txt        │
                                    │     allocs/op      │ allocs/op   vs base                │
SchemaValidation_Simple-4                   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
SchemaValidation_AllFields-4                0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
SchemaValidation_FormatValidation-4         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
SchemaValidation_ManySchemas-4              0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                                ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

pkg: github.com/GoCodeAlone/workflow/store
                                   │ baseline-bench.txt │       benchmark-results.txt        │
                                   │       sec/op       │    sec/op     vs base              │
EventStoreAppend_InMemory-4                1.280µ ± 13%   1.120µ ± 22%       ~ (p=0.132 n=6)
EventStoreAppend_SQLite-4                  1.360m ±  2%   1.389m ±  6%  +2.12% (p=0.041 n=6)
GetTimeline_InMemory/events-10-4           14.87µ ±  4%   14.42µ ±  4%       ~ (p=0.240 n=6)
GetTimeline_InMemory/events-50-4           73.88µ ± 13%   80.08µ ±  4%       ~ (p=0.589 n=6)
GetTimeline_InMemory/events-100-4          133.7µ ±  1%   131.0µ ±  2%  -2.01% (p=0.026 n=6)
GetTimeline_InMemory/events-500-4          680.7µ ±  2%   661.7µ ±  1%  -2.80% (p=0.002 n=6)
GetTimeline_InMemory/events-1000-4         1.402m ±  2%   1.356m ±  1%  -3.28% (p=0.002 n=6)
GetTimeline_SQLite/events-10-4             113.6µ ±  2%   109.8µ ±  1%  -3.33% (p=0.002 n=6)
GetTimeline_SQLite/events-50-4             264.1µ ±  5%   259.5µ ±  2%  -1.76% (p=0.026 n=6)
GetTimeline_SQLite/events-100-4            452.8µ ±  2%   441.1µ ±  1%  -2.60% (p=0.002 n=6)
GetTimeline_SQLite/events-500-4            1.990m ±  3%   1.880m ±  2%  -5.55% (p=0.002 n=6)
GetTimeline_SQLite/events-1000-4           3.872m ±  2%   3.608m ±  1%  -6.81% (p=0.002 n=6)
geomean                                    233.0µ         226.3µ        -2.87%

                                   │ baseline-bench.txt │        benchmark-results.txt         │
                                   │        B/op        │     B/op      vs base                │
EventStoreAppend_InMemory-4                  790.5 ± 8%     794.5 ± 4%       ~ (p=0.974 n=6)
EventStoreAppend_SQLite-4                  1.983Ki ± 2%   1.986Ki ± 1%       ~ (p=0.316 n=6)
GetTimeline_InMemory/events-10-4           7.953Ki ± 0%   7.953Ki ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-50-4           46.62Ki ± 0%   46.62Ki ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-100-4          94.48Ki ± 0%   94.48Ki ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-500-4          472.8Ki ± 0%   472.8Ki ± 0%       ~ (p=1.000 n=6)
GetTimeline_InMemory/events-1000-4         944.3Ki ± 0%   944.3Ki ± 0%       ~ (p=0.567 n=6)
GetTimeline_SQLite/events-10-4             16.74Ki ± 0%   16.74Ki ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-50-4             87.14Ki ± 0%   87.14Ki ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-100-4            175.4Ki ± 0%   175.4Ki ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-500-4            846.1Ki ± 0%   846.1Ki ± 0%       ~ (p=1.000 n=6)
GetTimeline_SQLite/events-1000-4           1.639Mi ± 0%   1.639Mi ± 0%       ~ (p=0.113 n=6)
geomean                                    67.35Ki        67.38Ki       +0.05%
¹ all samples are equal

                                   │ baseline-bench.txt │        benchmark-results.txt        │
                                   │     allocs/op      │  allocs/op   vs base                │
EventStoreAppend_InMemory-4                  7.000 ± 0%    7.000 ± 0%       ~ (p=1.000 n=6) ¹
EventStoreAppend_SQLite-4                    53.00 ± 0%    53.00 ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-10-4             125.0 ± 0%    125.0 ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-50-4             653.0 ± 0%    653.0 ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-100-4           1.306k ± 0%   1.306k ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-500-4           6.514k ± 0%   6.514k ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_InMemory/events-1000-4          13.02k ± 0%   13.02k ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-10-4               382.0 ± 0%    382.0 ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-50-4              1.852k ± 0%   1.852k ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-100-4             3.681k ± 0%   3.681k ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-500-4             18.54k ± 0%   18.54k ± 0%       ~ (p=1.000 n=6) ¹
GetTimeline_SQLite/events-1000-4            37.29k ± 0%   37.29k ± 0%       ~ (p=1.000 n=6) ¹
geomean                                     1.162k        1.162k       +0.00%
¹ all samples are equal

Benchmarks run with go test -bench=. -benchmem -count=6.
Regressions ≥ 20% are flagged. Results compared via benchstat.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 81.81818% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
plugin/external/remote_step.go 81.81% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: live-deployment example validation matrix

2 participants