Skip to content

Conversation

@nishu-builder
Copy link
Contributor

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

@nishu-builder nishu-builder marked this pull request as ready for review January 16, 2026 08:12
Copy link
Contributor Author

nishu-builder commented Jan 16, 2026


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • add-to-merge-queue - adds this PR to the back of the merge queue
  • add-to-merge-queue-as-hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@blacksmith-sh

This comment has been minimized.

@nishu-builder nishu-builder force-pushed the nishad/job-runner-spec-updates branch from 58c7118 to 8d52d84 Compare January 16, 2026 22:04
@blacksmith-sh

This comment has been minimized.

Copy link
Contributor

@rhysh rhysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, non blocking

from observatory.
- Run evaluation jobs in a dedicated AWS account separate from primary infrastructure
- Jobs don't submit their own results or pull inputs from Observatory
- Hot pool of pre-warmed nodes with per-job pod teardown (~5-10s startup target, one node per pod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hot pool is a strategy; the goals look like 1/ limit the impact of container escape by recycling nodes often, 2/ have fast startup


**Dispatcher** (primary account, part of Observatory)

- Creates k8s jobs in eval cluster via cross-account kubeconfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... based on aws eks get-token"


- Reads job spec from presigned GET URL (env var `JOB_SPEC_URI`)
- Downloads policies from presigned GET URLs
- Runs pure episode runner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... in the same pod / container"

- Downloads policies from presigned GET URLs
- Runs pure episode runner
- Writes results/replay to presigned PUT URLs
- No Observatory access, no AWS credentials, no network to primary account
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we make any attempt to restrict the Internet access? (If not, then we'll also have to contend with "the policy is actually a DOS bot" type things.)


1. **Job Creation** (primary account)
- Observatory creates job row in Postgres (status=pending)
- Dispatcher generates presigned S3 URIs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... for policy zips"

@graphite-app
Copy link
Contributor

graphite-app bot commented Jan 17, 2026

Merge activity

  • Jan 17, 1:05 AM UTC: nishu-builder added this pull request to the Graphite merge queue.
  • Jan 17, 1:06 AM UTC: CI is running for this pull request on a draft pull request (#4950) due to your merge queue CI optimization settings.
  • Jan 17, 1:10 AM UTC: Merged by the Graphite merge queue via draft PR: #4950.

graphite-app bot pushed a commit that referenced this pull request Jan 17, 2026
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants