Skip to content

fix: improve UX for version resolution error messages and doctor diagnostics #503

@sonupreetam

Description

@sonupreetam

Summary

During review of #479 (pinned version resolution fix), end-to-end testing of complyctl get and complyctl doctor across several version/registry configurations revealed UX issues in error messages and diagnostic output. These are pre-existing or only partially addressed by #479.

Issues

1. doctor warns about staleness on deliberately pinned versions

Scenario: User configures url: registry.io/policies/nist@v1.0.0 (pinned). Registry has both v1.0.0 and latest.

Current output:

⏭️ policy/nist-pinned: cached v1.0.0, available latest — run complyctl get to update

Problem: The user deliberately pinned to v1.0.0. Suggesting they "update" contradicts their intent and creates unnecessary noise. Running get would be a no-op since the pinned version is already cached.

Expected: Doctor should respect the pin and suppress the staleness warning, or adjust the message to acknowledge the pin (e.g., ✅ policy/nist-pinned: v1.0.0 (pinned — latest available: v1.0.0)).

2. "registry unreachable" when only the latest tag is missing

Scenario: User configures url: registry.io/policies/test (no version). Registry is online but has no latest tag (only versioned tags like v1.0.0).

Current output:

⏭️ registry/http://registry.io: unreachable: ...OCI version resolution failed for .../policies/test:latest: not found

Problem: The registry is reachable — the tag just doesn't exist. "unreachable" is a misdiagnosis. The user has no guidance on how to fix this.

Expected: Something like ⏭️ policy/test: latest tag not found — pin a specific version with @v1.0.0

3. Error messages have excessive nesting and internal detail leakage

Scenario: complyctl get fails for a missing version.

Current output:

Error: failed to sync policy policies/no-latest: policy policies/no-latest: registry unreachable: failed to fetch version for policies/no-latest: OCI version resolution failed for localhost:8766/policies/no-latest:latest: localhost:8766/policies/no-latest:latest: not found (cached data may still be available) (cached policies: [policies/cis-fedora-l1-workstation ...])

Problem: 5 layers of fmt.Errorf wrapping. The policy name appears 4 times. Internal details (full OCI ref format, unrelated cached policy names) are exposed. Users must parse through all of this to find "not found."

Expected: Flatten to a user-friendly message, e.g.: Error: policy no-latest: version "latest" not found in registry localhost:8766

4. doctor doesn't surface configured-vs-cached version mismatch

Scenario: User configures @v9.9.9 (doesn't exist). A previous get cached v1.0.0.

Current output:

⏭️ policy/nist-wrong: cached v1.0.0, available latest — run complyctl get to update

Problem: Doctor compares cached version against latest, ignoring that the user configured v9.9.9. Running get as suggested will fail because v9.9.9 doesn't exist. Doctor should detect the mismatch between configured version and cached version.

5. doctor panics with no complytime.yaml

Scenario: Run complyctl doctor in a directory with no complytime.yaml.

Current output:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x40 pc=...]

goroutine 1 [running]:
github.com/complytime/complyctl/internal/doctor.CheckCollector(0x...)
        internal/doctor/doctor.go:678 +0x20

Problem: runDoctor passes a nil cfg to doctor.Run when config loading fails. CheckCollector dereferences it without a nil check.

Expected: Doctor should report ❌ config: complytime.yaml not found — run complyctl init and continue remaining checks gracefully (or exit cleanly).

Reproduction

All scenarios were tested locally with the mock OCI registry (cmd/mock-oci-registry) and a custom no-latest-tag registry. Steps:

  1. make build
  2. Start mock registry: make mock-registry-background
  3. Create complytime.yaml with the scenario config (see descriptions above)
  4. Run complyctl get and complyctl doctor

Suggested Approach

  • Issues 1-4 can be addressed together in a single PR focused on internal/doctor/doctor.go and internal/cache/sync.go error messaging
  • Issue 5 (nil panic) is a separate bug fix — a nil guard in CheckCollector and any other check that receives cfg

Found during review of #479.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions