Skip to content

Recover hosted grove IDs from workspace markers#79

Merged
ptone merged 2 commits intoGoogleCloudPlatform:mainfrom
mfreeman451:fix/hosted-grove-marker-discovery
Apr 11, 2026
Merged

Recover hosted grove IDs from workspace markers#79
ptone merged 2 commits intoGoogleCloudPlatform:mainfrom
mfreeman451:fix/hosted-grove-marker-discovery

Conversation

@mfreeman451
Copy link
Copy Markdown
Contributor

@mfreeman451 mfreeman451 commented Apr 8, 2026

Summary

  • recover hosted grove IDs from workspace marker metadata when external grove config does not have settings.yaml
  • preserve the hosted grove identity used by broker reconciliation and heartbeat paths
  • add focused grove discovery tests for the workspace-marker fallback

Validation

  • go test ./pkg/config -run 'TestDiscoverGroves_(GitGroveWithExternalConfigUsesWorkspaceMarkerGroveID|GroveConfigNoScionNoAgents)$'

Copy link
Copy Markdown
Contributor Author

@mfreeman451 mfreeman451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mfreeman451 mfreeman451 marked this pull request as ready for review April 8, 2026 20:46
@mfreeman451
Copy link
Copy Markdown
Contributor Author

Tracking issue: #90

@ptone
Copy link
Copy Markdown
Member

ptone commented Apr 10, 2026

PR #79 Review: Recover hosted grove IDs from workspace markers

Executive Summary

This PR adds a fallback mechanism to recover grove IDs from workspace marker files when the external grove config lacks a settings.yaml. The production code change is small, well-scoped, and logically correct. However, the new test fails in any environment where SCION_GROVE_ID is set because LoadSettings picks up the env var via koanf, preventing the fallback path from being reached.

Critical Issues

1. Test fails when SCION_GROVE_ID environment variable is set (BLOCKING)

File: pkg/config/grove_discovery_test.go, new test TestDiscoverGroves_GitGroveWithExternalConfigUsesWorkspaceMarkerGroveID

Problem: The test does not unset SCION_GROVE_ID from the environment. LoadSettingsKoanf (called within groveInfoFromGitExternalWithConfig) loads env vars with the SCION_ prefix, mapping SCION_GROVE_ID to grove_id. This means settings.GroveID is non-empty after LoadSettings returns, and the new fallback code (if gi.GroveID == "") is never reached.

Observed failure:

grove_discovery_test.go:550: GroveID = "1f34781e-3d59-46ff-bd45-14618ceb39d6", want "3c619ec9-517e-4321-8c6a-4757f6a95607"

The value 1f34781e-... comes from SCION_GROVE_ID in the host environment, not from any test fixture.

Suggested Fix:

func TestDiscoverGroves_GitGroveWithExternalConfigUsesWorkspaceMarkerGroveID(t *testing.T) {
	tmpHome := t.TempDir()
	origHome := os.Getenv("HOME")
	if err := os.Setenv("HOME", tmpHome); err != nil {
		t.Fatalf("Setenv HOME failed: %v", err)
	}
	defer func() {
		if err := os.Setenv("HOME", origHome); err != nil {
			t.Fatalf("restore HOME failed: %v", err)
		}
	}()

+	// Unset SCION_GROVE_ID so LoadSettings doesn't pick it up from the
+	// environment and mask the workspace-marker fallback under test.
+	if v, ok := os.LookupEnv("SCION_GROVE_ID"); ok {
+		os.Unsetenv("SCION_GROVE_ID")
+		defer os.Setenv("SCION_GROVE_ID", v)
+	}

Alternatively, use t.Setenv("SCION_GROVE_ID", "") (Go 1.17+) which automatically restores the original value when the test completes. Note that t.Setenv also calls t.Parallel() prevention, which is appropriate here since the test already mutates global state (HOME).

Observations

1. readWorkspaceMarkerForSlug calls os.UserHomeDir() redundantly

groveInfoFromGitExternalWithConfig is called from DiscoverGroves, which already resolved home via os.UserHomeDir(). Passing home as a parameter to readWorkspaceMarkerForSlug would avoid the redundant syscall and make the function more testable (no hidden dependency on $HOME).

// Suggested signature:
func readWorkspaceMarkerForSlug(home, slug string) (*GroveMarker, string, error) {
	workspacePath := filepath.Join(home, GlobalDir, "groves", slug)
	// ...
}

This would require groveInfoFromGitExternalWithConfig to also accept home or for the caller to pass it through. Low priority but improves consistency.

2. WorkspacePath only set on fallback path

In groveInfoFromGitExternalWithConfig, gi.WorkspacePath is only populated when the workspace marker fallback is used. When LoadSettings succeeds and provides the grove ID, WorkspacePath remains empty. This is consistent with the pre-existing behavior (it was always empty in this function), but if downstream consumers rely on WorkspacePath being populated when GroveID is set, this could be a latent issue. Worth verifying intent.

3. Test structure follows existing patterns

The test correctly mirrors the directory layout conventions (grove-configs/<slug>__<shortid>/.scion/agents/<name>/home) and reuses WriteWorkspaceMarker for fixture setup. Good consistency with neighboring tests.

Positive Feedback

  • The fallback logic is well-guarded: it only fires when gi.GroveID == "", preventing it from overwriting a valid settings-based grove ID.
  • readWorkspaceMarkerForSlug properly returns the workspacePath alongside the marker, allowing the caller to set both GroveID and WorkspacePath atomically.
  • ReadGroveMarker already validates that GroveID and GroveSlug are non-empty, so the fallback will not silently produce a GroveInfo with an empty grove ID from a malformed marker file.

Final Verdict

Needs rework. The production code is sound, but the test has a blocking environment leak that causes it to fail when SCION_GROVE_ID is set (which is the case in any scion-managed agent environment, including CI). Unset the env var in the test setup and this is ready to merge.

@ptone
Copy link
Copy Markdown
Member

ptone commented Apr 10, 2026

This would be something that prevents CI from running inside an agent container, something we do regularly

@mfreeman451 mfreeman451 force-pushed the fix/hosted-grove-marker-discovery branch from e754693 to 48ea977 Compare April 11, 2026 05:15
@mfreeman451
Copy link
Copy Markdown
Contributor Author

Addressed the CI fragility called out in review.

Changes:

  • updated the test to neutralize ambient SCION_GROVE_ID instead of inheriting the surrounding environment
  • verified the test path with SCION_GROVE_ID explicitly set so it does not accidentally depend on container/runtime env

Current head: a4deaa0d

@ptone ptone merged commit 341e0c0 into GoogleCloudPlatform:main Apr 11, 2026
1 check passed
scion-gteam bot pushed a commit to ptone/scion that referenced this pull request Apr 12, 2026
* Recover hosted grove IDs from workspace markers

* Isolate hosted grove marker test from SCION_GROVE_ID
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants