Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# rhdh-fullsend

Custom fullsend sandbox images for the RHDH team's agent infrastructure.
Custom fullsend sandbox images and deployment documentation for the RHDH
team's agent infrastructure.

## Why this repo exists

Expand All @@ -14,6 +15,16 @@ workaround (a `host_files`-mounted shell script) is fragile.
This repo builds a single image that extends `fullsend-code:latest` with
corepack and yarn pre-activated.

## Documentation

| Doc | What it covers |
|-----|---------------|
| [Local Setup](docs/local-setup.md) | Podman VM, OpenShell gateway, GCP credentials, running agents locally |
| [Repo Onboarding](docs/repo-onboarding.md) | Installing fullsend on a new RHDH repo (standard and manual methods) |
| [GCP Infrastructure](docs/gcp-infrastructure.md) | GCP project, WIF providers, IAM, service accounts |
| [Sandbox Networking](docs/sandbox-networking.md) | DNS inside OpenShell sandboxes — why it fails, workarounds |
| [Known Issues](docs/known-issues.md) | Active friction points, workarounds, upstream tracking |

## Image

```
Expand Down Expand Up @@ -52,8 +63,7 @@ This replaces the `sandbox-yarn-setup.sh` + `host_files` workaround.

## Local agent runs

See [docs/local-setup.md](docs/local-setup.md) for the full guide: Podman VM,
OpenShell gateway, GCP credentials, SSH tunnel, and running agents end-to-end.
See [Local Setup](docs/local-setup.md) for running agents on macOS.

## Local build

Expand Down
233 changes: 233 additions & 0 deletions docs/gcp-infrastructure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
# GCP Infrastructure

GCP project, Workload Identity Federation, IAM, and service account reference
for the RHDH fullsend setup.

## Project context

| Field | Value |
|-------|-------|
| GCP project ID | `rhdh-sidekick-167988` |
| GCP project number | `189673402608` |
| Vertex AI region | `us-east5` |
| WIF pool | `fullsend-pool` (ACTIVE) |
| IAM admin group | `rhdh-sidekick@redhat.com` |

The project lives under `IT Public Cloud > Sandbox > Customers` in the GCP
org hierarchy. The admin group has `iam.workloadIdentityPoolAdmin`,
`iam.serviceAccountAdmin`, and `iam.serviceAccountKeyAdmin` — sufficient to
self-provision WIF providers and service accounts without fullsend team
involvement.

**Conditional IAM restriction:** The `projectIamAdmin` role on this project
is restricted to granting only `roles/aiplatform.user`:

```
expression: api.getAttribute('iam.googleapis.com/modifiedGrantsByRole',
[]).hasOnly(['roles/aiplatform.user'])
```

This means you cannot grant yourself additional roles or enable APIs. All
changes beyond `aiplatform.user` must go through IT (ServiceNow ticket).

## WIF providers

Each repo gets its own WIF provider, scoped via `attribute-condition` to
that specific repository.

### Current providers

| Provider | Repo scope | State |
|----------|-----------|-------|
| `gh-redhat-developer-rhdh-agentic` | `redhat-developer/rhdh-agentic` | ACTIVE |
| `gh-redhat-developer-rhdh-plugins` | `redhat-developer/rhdh-plugins` | ACTIVE |
| `gh-rhdeveloper-plugin-export` | `redhat-developer/rhdh-plugin-export-overlays` | ACTIVE |

### Creating a new WIF provider

```bash
PROVIDER_NAME="gh-redhat-developer-<repo>" # max 32 chars
PROVIDER_PATH="projects/189673402608/locations/global/workloadIdentityPools/fullsend-pool/providers/${PROVIDER_NAME}"

gcloud iam workload-identity-pools providers create-oidc "$PROVIDER_NAME" \
--location=global \
--workload-identity-pool=fullsend-pool \
--project=rhdh-sidekick-167988 \
--issuer-uri=https://token.actions.githubusercontent.com \
--allowed-audiences="fullsend-mint,https://iam.googleapis.com/${PROVIDER_PATH}" \
--attribute-mapping="google.subject=assertion.sub,attribute.actor=assertion.actor,attribute.repository=assertion.repository,attribute.repository_owner=assertion.repository_owner" \
--attribute-condition="assertion.repository == '<org>/<repo>'"
```

### Dual-audience requirement

Two audiences are required in `--allowed-audiences`:

| Audience | Used by | Step |
|----------|---------|------|
| `fullsend-mint` | Mint token exchange | GitHub OIDC → fullsend session token |
| `https://iam.googleapis.com/projects/189673402608/.../providers/<name>` | `google-github-actions/auth` | GCP credential setup for Vertex AI |

Omitting the second audience causes an `audience mismatch` error at the
"Setup GCP" step in the workflow. The `fullsend admin install` CLI sets
both automatically; manual provider creation must include both.

### IAM binding

The existing `aiplatform.user` binding covers all `redhat-developer` repos
via the `attribute.repository_owner` principal set:

```
principalSet://iam.googleapis.com/projects/189673402608/locations/global/workloadIdentityPools/fullsend-pool/attribute.repository_owner/redhat-developer
```

No per-repo IAM binding is needed after the initial setup.

## Service accounts

For local agent runs (not CI). See also
[Local Setup — GCP Credentials](local-setup.md#step-3-gcp-credentials).

### Creating a service account

```bash
gcloud iam service-accounts create fullsend-local \
--display-name="Fullsend local agent runner" \
--project=rhdh-sidekick-167988
```

There is a propagation delay of a few seconds before the SA can be used in
IAM bindings.

### Granting Vertex AI access

```bash
gcloud projects add-iam-policy-binding rhdh-sidekick-167988 \
--member="serviceAccount:fullsend-local@rhdh-sidekick-167988.iam.gserviceaccount.com" \
--role="roles/aiplatform.user" \
--condition=None
```

`--condition=None` is required because the project has conditional IAM
bindings. Without it, `gcloud` prompts interactively.

### Creating a JSON key

```bash
gcloud iam service-accounts keys create \
~/.config/fullsend/fullsend-local-credentials.json \
--iam-account=fullsend-local@rhdh-sidekick-167988.iam.gserviceaccount.com

chmod 600 ~/.config/fullsend/fullsend-local-credentials.json
```

The key file contains a private key. Do not commit it to git or share via
Slack. If compromised:

```bash
KEY_ID=$(python3 -c "import json,sys; print(json.load(sys.stdin)['private_key_id'])" \
< ~/.config/fullsend/fullsend-local-credentials.json)
gcloud iam service-accounts keys delete "$KEY_ID" \
--iam-account=fullsend-local@rhdh-sidekick-167988.iam.gserviceaccount.com
```

### Per-person service accounts

For individual usage tracking, create per-person SAs:

```bash
NAME="fullsend-local-<name>" # kebab-case, max 30 chars

gcloud iam service-accounts create "$NAME" \
--display-name="Fullsend local – <Name>" \
--project=rhdh-sidekick-167988

gcloud projects add-iam-policy-binding rhdh-sidekick-167988 \
--member="serviceAccount:${NAME}@rhdh-sidekick-167988.iam.gserviceaccount.com" \
--role="roles/aiplatform.user" \
--condition=None

gcloud iam service-accounts keys create "/tmp/${NAME}-credentials.json" \
--iam-account="${NAME}@rhdh-sidekick-167988.iam.gserviceaccount.com"
```

Share the key file securely (Bitwarden, 1Password — never Slack or email)
and delete the local copy.

### Key rotation

Create a new key before deleting the old one to avoid downtime:

```bash
gcloud iam service-accounts keys create \
~/.config/fullsend/fullsend-local-credentials-new.json \
--iam-account=fullsend-local@rhdh-sidekick-167988.iam.gserviceaccount.com

# Test with the new key, then:
OLD_KEY_ID=$(python3 -c "import json,sys; print(json.load(sys.stdin)['private_key_id'])" \
< ~/.config/fullsend/fullsend-local-credentials.json)
gcloud iam service-accounts keys delete "$OLD_KEY_ID" \
--iam-account=fullsend-local@rhdh-sidekick-167988.iam.gserviceaccount.com

mv ~/.config/fullsend/fullsend-local-credentials-new.json \
~/.config/fullsend/fullsend-local-credentials.json
```

## IAM troubleshooting

### "Permission 'aiplatform.endpoints.predict' denied"

The WIF principal has no `roles/aiplatform.user` binding. Verify:

```bash
gcloud projects get-iam-policy rhdh-sidekick-167988 \
--flatten="bindings[].members" \
--filter="bindings.members:principalSet" \
--format="table(bindings.role, bindings.members)"
```

If the binding is missing, add it using the org-level principal set (covers
all repos under `redhat-developer`):

```bash
gcloud projects add-iam-policy-binding rhdh-sidekick-167988 \
--role="roles/aiplatform.user" \
--member="principalSet://iam.googleapis.com/projects/189673402608/locations/global/workloadIdentityPools/fullsend-pool/attribute.repository_owner/redhat-developer" \
--condition=None
```

### Installer claims success but binding is missing

The `fullsend admin install` CLI may report "granted roles/aiplatform.user"
even when the conditional `projectIamAdmin` role silently blocks the grant.
Always verify with `gcloud projects get-iam-policy` after install. IAM
propagation can take up to 7 minutes.

### "audience mismatch" at Setup GCP step

The WIF provider was created with only one allowed audience. Update it to
include both:

```bash
gcloud iam workload-identity-pools providers update-oidc <provider-name> \
--location=global \
--workload-identity-pool=fullsend-pool \
--project=rhdh-sidekick-167988 \
--allowed-audiences="fullsend-mint,https://iam.googleapis.com/<wif-provider-path>"
```

### Monitoring Vertex AI usage

Via GCP Console: Vertex AI → Model Garden → Usage page. Filter by service
account for per-SA token consumption.

Via CLI:

```bash
gcloud logging read \
'resource.type="aiplatform.googleapis.com/Endpoint" AND
protoPayload.authenticationInfo.principalEmail="fullsend-local@rhdh-sidekick-167988.iam.gserviceaccount.com"' \
--project=rhdh-sidekick-167988 \
--limit=10 \
--format="table(timestamp, protoPayload.request.model, protoPayload.response.usageMetadata)"
```
83 changes: 83 additions & 0 deletions docs/known-issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Known Issues

Active friction points, workarounds, and upstream tracking for the RHDH
fullsend setup. Last updated: 2026-06-09.

## Sandbox and tooling

| Issue | Impact | Workaround | Status |
|-------|--------|------------|--------|
| DNS broken inside sandboxes | `yarn install`, `pip install`, `git clone` fail with `getaddrinfo EAI_AGAIN` | Explicit `httpProxy`/`httpsProxy` in `.yarnrc.yml` pointing to the L7 proxy | By design — see [Sandbox Networking](sandbox-networking.md) |
| `yarn install` takes 10-15 min in sandbox | Monorepo overhead for large workspaces | Custom image with yarn pre-installed eliminates bootstrap; consider pre-installing deps | Open |
| Git hooks (husky) need yarn in PATH | Hooks run in subprocesses without the agent's PATH | Custom image with `/usr/local/bin/yarn` wrapper — see [rhdh-fullsend-code image](../README.md) | Solved |
| Sandbox creation timeout (60s) for large images | Code agent uses `fullsend-code:latest` (larger than triage sandbox) | Upstream fix exists (pre-pull + retry + 120s timeout) but not in `@v0` tag. Set `FULLSEND_SANDBOX_READY_TIMEOUT=180` as env var. | Fixed upstream, pending `@v0` release |
| `/etc/resolv.conf` points to unreachable nameserver | Tools timeout instead of failing fast | None — consider filing OpenShell issue | Open |

## Agent behavior

| Issue | Impact | Workaround | Status |
|-------|--------|------------|--------|
| Triage doesn't auto-trigger on `issues/opened` | Must use `/fs-triage` slash command | Post `/fs-triage` as issue comment | By design — dispatcher only handles `issues/labeled` |
| Coder doesn't auto-trigger from triage | Triage labels `triaged`, not `ready-to-code` | Post `/fs-code` manually after triage | By design |
| Fix only triggers from bot reviews | Human `changes_requested` reviews don't trigger fix agent | Post `/fs-fix` manually | By design |
| Retro dropped by concurrency group collision | Retro job gets cancelled by other dispatch jobs | Post `/fs-retro` manually in a quiet window | Open |
| Custom agent stages not supported in per-repo mode | Cannot register custom `/fs-*` slash commands | Extend existing agents with custom skills instead of building standalone agents | Architectural limitation |

## Monorepo-specific

| Issue | Impact | Workaround | Status |
|-------|--------|------------|--------|
| No workspace awareness | Agent sees full repo context, not just the workspace a PR touches | `paths` filter on `pull_request_target` for workspace-level triggering | Partial — shim-level only |
| Routing skill: label priority | Agent guesses workspace from title/body instead of trusting `workspace/*` label | Improve routing skill to prioritize existing labels | Open |
| Routing skill not in triage harness | Triage has no workspace awareness — can misroute issues | Add routing skill to triage harness | Open |
| `workspace/*` labels not automated | Must manually create labels when adding workspaces | Automate label creation when a new workspace is added | Open |

## Observability

| Issue | Impact | Workaround | Status |
|-------|--------|------------|--------|
| Agent transcript not visible inline in GHA logs | Must download artifact separately | `gh run download <run-id> --name fullsend-code` | Open |
| No summary in GHA step output | Hard to see what the agent did at a glance | Consider post-script step extracting key actions from transcript | Open |

## Upstream harness drift

Customized harness and policy files are **copies** of upstream (baseline
2026-06-05). When upstream changes, our copies need manual sync:

| File | Repo |
|------|------|
| `harness/code.yaml` | rhdh-plugins |
| `harness/fix.yaml` | rhdh-plugins |
| `policies/code.yaml` | rhdh-plugins |
| `policies/fix.yaml` | rhdh-plugins |
| `agents/code.md` | rhdh-plugins |

## Upstream feature requests

| Issue | Description | Status |
|-------|-------------|--------|
| [fullsend#1937](https://github.com/fullsend-ai/fullsend/issues/1937) | Native `working_dir` field in harness schema | Filed |
| `repo.yarnpkg.com` missing from upstream policies | Any JS monorepo using corepack + yarn hits this | Not yet filed |
| `sandbox_init_script` in harness schema | Pre-agent env setup without relying on `.env.d` or skills | Not yet filed |
| [OpenShell#1107](https://github.com/NVIDIA/OpenShell/issues/1107) | `/etc/hosts` injection for policy-allowed hostnames | Open, assigned |

## `@v0` tag regression risk

Commit `709d8af0` (2026-05-15) fixed per-repo retro/prioritize routing by
removing the `retro|prioritize → fullsend` stage-to-role mapping. However,
PR #1187 (`005ac0a1`, 2026-05-19) re-introduced the old mapping on `main`.
The `@v0` tag predates this regression, so per-repo mode is currently safe.

**Risk:** If `@v0` advances past PR #1187, per-repo retro and prioritize
will silently break for all consumer orgs whose config lists
`retro`/`prioritize` instead of `fullsend`.

## Public repo security

Fullsend's `issue_comment` trigger routes to agents without checking
`author_association`. Any external user posting `/fs-review` on a public
repo's PR triggers Vertex AI inference on the repo owner's GCP project.

**Mitigation:** Add an `author_association` check to the dispatch job in
the shim workflow. Applied in rhdh-plugins and rhdh-plugin-export-overlays.
See [Repo Onboarding — Method 2](repo-onboarding.md#method-2-manual-install-customized-shim).
20 changes: 3 additions & 17 deletions docs/local-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,23 +100,9 @@ service account key for the `rhdh-sidekick-167988` project with the
**If your team lead provides the key file:** save it to
`~/.config/fullsend/fullsend-local-credentials.json` and `chmod 600` it.

**If you need to create the SA yourself:**

```bash
gcloud iam service-accounts create fullsend-local \
--display-name="Fullsend local agent runner" \
--project=rhdh-sidekick-167988

gcloud projects add-iam-policy-binding rhdh-sidekick-167988 \
--member="serviceAccount:fullsend-local@rhdh-sidekick-167988.iam.gserviceaccount.com" \
--role="roles/aiplatform.user" \
--condition=None

gcloud iam service-accounts keys create ~/.config/fullsend/fullsend-local-credentials.json \
--iam-account=fullsend-local@rhdh-sidekick-167988.iam.gserviceaccount.com

chmod 600 ~/.config/fullsend/fullsend-local-credentials.json
```
**If you need to create the SA yourself:** see
[GCP Infrastructure — Service Accounts](gcp-infrastructure.md#service-accounts)
for the full `gcloud` commands (create SA, grant role, generate key, rotate).

## Step 4: Create env files

Expand Down
Loading