Skip to content

Conversation

@miyunari
Copy link
Member

@miyunari miyunari commented Jan 25, 2026

Provide optional continuous model validation supported on k8s 1.28 or newer.

The continuousValidation entry currently only supports checking the model
on a given interval.
In the future we may want to extend it to support things like:

  • onCRDChange
  • onSigChange
  • ...

Example:

apiVersion: ml.sigstore.dev/v1alpha1
kind: ModelValidation
metadata:
  name: continuous-signed-test
spec:
  config:
    publicKeyConfig:
      keyPath: /keys/test_public_key.pub
  model:
    path: /data
    signaturePath: /data/model.sig
  imagePullPolicy: Always
  continuousValidation:
    enabled: true
    interval: "5m"

Signed-off-by: Nina Bongartz pnink@web.de

@miyunari miyunari changed the title poc: continuous model validation Support continuous model validation Feb 3, 2026
}

func runValidation(args []string, logger logr.Logger) error {
cmd := exec.Command("/usr/local/bin/model_signing", args...)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future it might be more ideal to use the go library.

@miyunari miyunari force-pushed the continues_check branch 4 times, most recently from de401ee to fb05249 Compare February 7, 2026 12:08
Nina Bongartz added 2 commits February 7, 2026 13:12
Provide optional continuous model validation supported on k8s 1.28 or newer.

The continuousValidation entry currently only supports checking the model
on a given interval.
In the future we may want to extend it to support things like:
- onCRDChange
- onSigChange
- ...

Example:
```yaml
apiVersion: ml.sigstore.dev/v1alpha1
kind: ModelValidation
metadata:
  name: continuous-signed-test
spec:
  config:
    publicKeyConfig:
      keyPath: /keys/test_public_key.pub
  model:
    path: /data
    signaturePath: /data/model.sig
  imagePullPolicy: Always
  continuousValidation:
    enabled: true
    interval: "5m"
```

Signed-off-by: Nina Bongartz <pnink@web.de>
E2E tests were failing with Init:ErrImagePull because the operator
expected ghcr.io/sigstore/model-validation-agent:v0.1.0 but e2e
built ghcr.io/sigstore/model-validation-operator-agent:v0.0.1.

Fix by building the agent image with the correct name that matches
the hardcoded constant in internal/constants/images.go.

Also set imagePullPolicy: IfNotPresent in test templates to ensure
e2e tests use locally loaded images instead of trying to pull from
the registry.

Signed-off-by: Nina Bongartz <pnink@web.de>
@miyunari miyunari force-pushed the continues_check branch 2 times, most recently from 30ee8ab to a3a9e13 Compare February 7, 2026 12:45
The e2e test "should successfully validate with public key signature"
was failing with permission denied errors when the validation-agent
tried to read model files.

Root cause:
- The validation-agent runs as UID 65532 (non-root) per Dockerfile.agent
- The model-data-setup DaemonSet runs as root and copies files to hostPath
- Files copied by root were not readable by the non-root agent process

Fix:
- Added chmod -R a+rX to make files readable by all users
- The 'a+rX' flag makes files readable and directories executable
- Capital X only adds execute on directories, not regular files

This allows the validation-agent to successfully read model files
during validation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Nina Bongartz <pnink@web.de>
@miyunari miyunari requested review from SequeI and knrc February 7, 2026 13:00
@miyunari miyunari marked this pull request as ready for review February 7, 2026 13:00

# AGENT_IMG defines the image:tag used for the validation agent.
# Use the same agent image name as hardcoded in internal/constants/images.go
AGENT_IMG ?= ghcr.io/sigstore/model-validation-agent:v0.1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in comment, we probably shouldn't hardcode the same image version in two different places, might cause image drift in the future/not needed extra maintenance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be added as a build flag


if err := server.ListenAndServe(); err != nil {
logger.Error(err, "Health server failed")
os.Exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have some sort of graceful shutdown for the health server, otherwise it will keep running until terminated if we have an issue with the main app

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we should be able to pass in a context and have that trigger the shutdown.

// Interval defines how often to re-validate the model (e.g., "5m", "1h").
// Only used when Enabled is true.
// +kubebuilder:default="5m"
// +kubebuilder:validation:Pattern=`^([0-9]+(\.[0-9]+)?(s|m|h))+$`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex allows for patterns like 1s and such, we probably don't want to cause massive CPU usage. We should add a minimum interval like 1m/5m to prevent a DOS scenario


container := corev1.Container{
Name: constants.ModelValidationInitContainerName,
Image: constants.ModelValidationAgentImage,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, even if continuous validation is not enabled, we still use ModelValidationAgentImage instead of ModelTransparencyCliImage?

imagePullPolicy = mv.Spec.ImagePullPolicy
}

container := corev1.Container{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add resource limits to the sidecar container to prevent unbounded resource allocation

if pp.Annotations == nil {
pp.Annotations = make(map[string]string)
}
pp.Annotations[constants.ContinuousValidationAnnotationKey] = "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this annotation actually being used anywhere, is this purely just for tracking/not to be used programatically to achieve something?

@SequeI
Copy link
Contributor

SequeI commented Feb 9, 2026

Looks good so far, left some questions/review on the PR :)

}

// ContinuousValidation defines the configuration for continuous model validation.
// When enabled, the validation container runs as a native sidecar (requires Kubernetes 1.28+)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this feature is only enabled by default from 1.29, when it became beta, and needs to be enabled in 1.28


# AGENT_IMG defines the image:tag used for the validation agent.
# Use the same agent image name as hardcoded in internal/constants/images.go
AGENT_IMG ?= ghcr.io/sigstore/model-validation-agent:v0.1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be added as a build flag


if err := server.ListenAndServe(); err != nil {
logger.Error(err, "Health server failed")
os.Exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we should be able to pass in a context and have that trigger the shutdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants