Skip to content

feat: add pull-based runner API (GitHub Actions runner style)#325

Open
as535364 wants to merge 4 commits intoNormal-OJ:mainfrom
as535364:claude/github-actions-runner-format-X3dyo
Open

feat: add pull-based runner API (GitHub Actions runner style)#325
as535364 wants to merge 4 commits intoNormal-OJ:mainfrom
as535364:claude/github-actions-runner-format-X3dyo

Conversation

@as535364
Copy link
Copy Markdown
Member

Add a new /runner API blueprint that allows sandbox instances to operate as pull-based runners instead of receiving pushed submissions. Runners poll for pending jobs, claim them, download code/testdata, and report results back — similar to GitHub Actions self-hosted runners.

Changes:

  • model/runner.py: New runner API with /jobs, /claim, /code, /testdata, /complete, and /heartbeat endpoints
  • mongo/engine.py: Add claimed_by/claimed_at fields to Submission
  • mongo/submission.py: Support RUNNER_MODE=pull env var in send()
  • app.py: Register runner_api blueprint at /runner

https://claude.ai/code/session_01CD3q84rDaDXMnAmQVh58qR

claude added 3 commits March 23, 2026 12:31
Add a new /runner API blueprint that allows sandbox instances to operate
as pull-based runners instead of receiving pushed submissions. Runners
poll for pending jobs, claim them, download code/testdata, and report
results back — similar to GitHub Actions self-hosted runners.

Changes:
- model/runner.py: New runner API with /jobs, /claim, /code, /testdata,
  /complete, and /heartbeat endpoints
- mongo/engine.py: Add claimed_by/claimed_at fields to Submission
- mongo/submission.py: Support RUNNER_MODE=pull env var in send()
- app.py: Register runner_api blueprint at /runner

https://claude.ai/code/session_01CD3q84rDaDXMnAmQVh58qR
21 tests covering all runner endpoints:
- Auth (token validation, 403 on missing/invalid token)
- GET /runner/jobs (pending jobs, claimed exclusion, expired claims)
- POST /runner/jobs/<id>/claim (success, 404, 409, field updates)
- GET /runner/jobs/<id>/code (download by claimer, 403 for others)
- POST /runner/heartbeat (extend claim, no-op without id, 403)
- PUT /runner/jobs/<id>/complete (missing body, missing fields,
  invalid token, full flow with result processing)

https://claude.ai/code/session_01CD3q84rDaDXMnAmQVh58qR
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new pull-based “runner” API (GitHub Actions-style) so sandbox/runner instances can poll for pending submissions, claim jobs, download code/test data, and report results back.

Changes:

  • Add runner_api blueprint with job polling/claiming, artifact download, completion, and heartbeat endpoints.
  • Extend Submission with claimed_by / claimed_at fields to support pull-runner job leasing.
  • Add RUNNER_MODE=pull behavior to queue submissions for runner pickup instead of pushing to a sandbox.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/test_runner.py Adds end-to-end tests for the new runner API flows (auth, jobs, claim, code download, heartbeat, completion).
mongo/submission.py Adds pull-mode support in Submission.send() controlled by RUNNER_MODE.
mongo/engine.py Adds claimed_by / claimed_at fields to the Submission document.
model/runner.py Implements the new /runner API blueprint and endpoints.
model/init.py Exposes the new runner module via package imports/__all__.
app.py Registers runner_api under the /runner URL prefix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

model/runner.py Outdated
Comment on lines +101 to +112
# Check if already claimed by an active runner
if (sub.claimed_by is not None and sub.claimed_at is not None
and sub.claimed_at > claim_deadline):
return HTTPError('already claimed by another runner', 409)

runner_name = _get_runner_name(request)

# Claim the job
sub.update(
claimed_by=runner_name,
claimed_at=now,
)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claim_job is subject to a race condition: it reads sub, checks whether it’s currently claimed, then calls sub.update(...). Two runners hitting this endpoint concurrently can both pass the check and both end up believing they claimed the job. This should be done as an atomic conditional update (e.g., Submission.objects(id=..., status=-1, (claimed_by=None) | (claimed_at<=deadline)).update_one(...) and verify exactly one document was modified) and return 409 otherwise.

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +21
def _verify_runner_token(req):
"""Verify the runner token from request headers or params."""
token = req.headers.get('X-Runner-Token') or req.args.get('token', '')
config = Submission.config()
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runner authentication currently accepts the runner token via query string (req.args.get('token')). Tokens in URLs are easy to leak via logs, browser history, proxies, and referers. Prefer accepting the token only via X-Runner-Token header (or at least gate the query param behind an explicit debug/testing mode).

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +36
return req.headers.get('X-Runner-Name', 'unknown')


@runner_api.before_request
def check_runner_auth():
if not _verify_runner_token(request):
return HTTPError('invalid runner token', 403)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_runner_name defaults to 'unknown'. If a runner forgets to send X-Runner-Name, all such runners will share the same identity, allowing one to download/heartbeat/complete another’s job (since checks compare to claimed_by). Consider requiring X-Runner-Name to be present (400 if missing) and/or deriving a stable runner identity from the authenticated token instead of a user-supplied string.

Suggested change
return req.headers.get('X-Runner-Name', 'unknown')
@runner_api.before_request
def check_runner_auth():
if not _verify_runner_token(request):
return HTTPError('invalid runner token', 403)
name = req.headers.get('X-Runner-Name')
# Treat missing or empty header as no runner name
if not name:
return None
return name
@runner_api.before_request
def check_runner_auth():
if not _verify_runner_token(request):
return HTTPError('invalid runner token', 403)
# Require a non-empty runner name to avoid shared default identities
if not _get_runner_name(request):
return HTTPError('missing runner name', 400)

Copilot uses AI. Check for mistakes.
Comment on lines +252 to +253

try:
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on_job_complete verifies only the per-submission token; it does not verify that the caller is the runner that claimed the job (claimed_by). That means any holder of the runner token + submission token can complete the job, even if they didn’t claim it. Add a check that sub.claimed_by == _get_runner_name(request) (and possibly that the claim hasn’t expired) before accepting results.

Suggested change
try:
# Ensure that the runner completing the job is the same runner
# that currently holds the claim on this submission.
try:
sub = engine.Submission.objects.get(id=submission_id)
except engine.DoesNotExist:
return HTTPError('submission not found', 404)
runner_name = _get_runner_name(request)
if sub.claimed_by != runner_name:
return HTTPError('not claimed by this runner', 403)
try:

Copilot uses AI. Check for mistakes.
Comment on lines +237 to +245
data = request.json
if data is None:
return HTTPError('missing JSON body', 400)

tasks = data.get('tasks')
token = data.get('token')
if tasks is None or token is None:
return HTTPError('missing tasks or token', 400)

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint hand-rolls JSON parsing/validation (data = request.json, manual missing-field checks). Elsewhere in the codebase (e.g. model/submission.py), request validation uses the @Request.json(...) decorator which consistently handles missing/invalid content types and required fields. Consider switching this route to the shared request-validation helpers to keep behavior consistent and reduce duplication.

Copilot uses AI. Check for mistakes.
model/runner.py Outdated
return HTTPError('not claimed by this runner', 403)

problem = sub.problem
submission = Submission(submission_id)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submission = Submission(submission_id) is created but never used in get_job_testdata, which makes the handler harder to read and suggests a missing check. Remove the unused variable (or use it if there’s intended logic such as token/permission checks).

Suggested change
submission = Submission(submission_id)

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +80
@pytest.fixture(autouse=True, scope='session')
def setup_minio():
"""Override session-scoped minio fixture to start minio via Docker."""
container_name = 'test-minio-runner'
subprocess.run(
['docker', 'rm', '-f', container_name],
capture_output=True,
)
proc = subprocess.run(
[
'docker',
'run',
'-d',
'--name',
container_name,
'-p',
'19000:9000',
'-e',
'MINIO_ROOT_USER=minioadmin',
'-e',
'MINIO_ROOT_PASSWORD=minioadmin',
'minio/minio:latest',
'server',
'/data',
'--address',
':9000',
],
capture_output=True,
text=True,
)
assert proc.returncode == 0, f'Failed to start minio: {proc.stderr}'

# Wait for minio to be ready
for _ in range(30):
try:
import urllib.request
urllib.request.urlopen('http://localhost:19000/minio/health/live',
timeout=2)
break
except Exception:
_time.sleep(1)
else:
raise TimeoutError('Minio did not start in time')

mongo.config.MINIO_ACCESS_KEY = 'minioadmin'
mongo.config.MINIO_SECRET_KEY = 'minioadmin'
mongo.config.MINIO_HOST = 'localhost:19000'
mongo.config.FLASK_DEBUG = True

client = Minio(
'localhost:19000',
access_key='minioadmin',
secret_key='minioadmin',
secure=False,
)
bucket = mongo.config.MINIO_BUCKET
if not client.bucket_exists(bucket):
client.make_bucket(bucket)
yield

subprocess.run(['docker', 'rm', '-f', container_name], capture_output=True)


Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test module overrides the session-scoped setup_minio fixture from tests/conftest.py and starts MinIO via docker run on a fixed host port (19000) using minio/minio:latest. This makes tests non-deterministic (latest tag), more brittle (port conflicts), and requires a Docker CLI/daemon even though the repo already uses testcontainers.minio.MinioContainer with a pinned version in conftest.py. Prefer reusing the existing session fixture instead of overriding it here.

Suggested change
@pytest.fixture(autouse=True, scope='session')
def setup_minio():
"""Override session-scoped minio fixture to start minio via Docker."""
container_name = 'test-minio-runner'
subprocess.run(
['docker', 'rm', '-f', container_name],
capture_output=True,
)
proc = subprocess.run(
[
'docker',
'run',
'-d',
'--name',
container_name,
'-p',
'19000:9000',
'-e',
'MINIO_ROOT_USER=minioadmin',
'-e',
'MINIO_ROOT_PASSWORD=minioadmin',
'minio/minio:latest',
'server',
'/data',
'--address',
':9000',
],
capture_output=True,
text=True,
)
assert proc.returncode == 0, f'Failed to start minio: {proc.stderr}'
# Wait for minio to be ready
for _ in range(30):
try:
import urllib.request
urllib.request.urlopen('http://localhost:19000/minio/health/live',
timeout=2)
break
except Exception:
_time.sleep(1)
else:
raise TimeoutError('Minio did not start in time')
mongo.config.MINIO_ACCESS_KEY = 'minioadmin'
mongo.config.MINIO_SECRET_KEY = 'minioadmin'
mongo.config.MINIO_HOST = 'localhost:19000'
mongo.config.FLASK_DEBUG = True
client = Minio(
'localhost:19000',
access_key='minioadmin',
secret_key='minioadmin',
secure=False,
)
bucket = mongo.config.MINIO_BUCKET
if not client.bucket_exists(bucket):
client.make_bucket(bucket)
yield
subprocess.run(['docker', 'rm', '-f', container_name], capture_output=True)

Copilot uses AI. Check for mistakes.
model/runner.py Outdated
Comment on lines +54 to +64
pending = engine.Submission.objects(
status=-1,
language__ne=3,
).order_by('last_send')

jobs = []
for sub in pending:
# Skip if actively claimed by another runner
if (sub.claimed_by is not None and sub.claimed_at is not None
and sub.claimed_at > claim_deadline):
continue
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_pending_jobs loads all pending submissions (status=-1) and then filters out actively-claimed jobs in Python. If the queue is large (or many jobs are claimed), this can scan a lot of documents per poll. Consider pushing the claim-expiry logic into the Mongo query (OR on claimed_by=None / claimed_at=None / claimed_at<=deadline) and apply .limit(10) at the query level.

Suggested change
pending = engine.Submission.objects(
status=-1,
language__ne=3,
).order_by('last_send')
jobs = []
for sub in pending:
# Skip if actively claimed by another runner
if (sub.claimed_by is not None and sub.claimed_at is not None
and sub.claimed_at > claim_deadline):
continue
# The claim-expiry logic is pushed into the Mongo query to avoid
# scanning a large number of documents in Python.
pending = engine.Submission.objects(
status=-1,
language__ne=3,
).filter(
__raw__={
'$or': [
{'claimed_by': None},
{'claimed_at': None},
{'claimed_at': {'$lte': claim_deadline}},
]
}
).order_by('last_send').limit(10)
jobs = []
for sub in pending:

Copilot uses AI. Check for mistakes.
- Use atomic conditional update (update_one) for job claiming to prevent
  race conditions between concurrent runners
- Remove query string token fallback, accept only X-Runner-Token header
- Verify claimed_by on job completion to prevent unauthorized result submission
- Remove unused variable in testdata endpoint
- Push claim-expiry filter into MongoDB query instead of filtering in Python

https://claude.ai/code/session_01CD3q84rDaDXMnAmQVh58qR
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +435 to +437
# Pull mode: just mark as pending, runners will pick it up
runner_mode = os.getenv('RUNNER_MODE', 'push')
if runner_mode == 'pull':
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pull mode, send() returns True without updating any persisted state, but the docstring says it “marks submission as pending for runners to pick up”. Since send() is invoked by rejudge() after setting status/last_send, this is a good place to also clear any existing runner claim fields (and/or ensure status=-1/last_send refreshed) so the job becomes immediately available to runners after rejudge.

Suggested change
# Pull mode: just mark as pending, runners will pick it up
runner_mode = os.getenv('RUNNER_MODE', 'push')
if runner_mode == 'pull':
# Pull mode: mark as pending, runners will pick it up
runner_mode = os.getenv('RUNNER_MODE', 'push')
if runner_mode == 'pull':
# refresh status / last_send so the job appears pending
if hasattr(self, 'status'):
# -1 is commonly used as "waiting/pending" in this codebase
try:
self.status = -1
except Exception:
# fail soft if status is read-only or incompatible
pass
if hasattr(self, 'last_send'):
try:
self.last_send = datetime.utcnow()
except Exception:
pass
# clear any existing runner-claim fields so runners can re-claim
for _attr in (
'runner_id',
'runner_claimed_at',
'runner_lease_expires_at',
'claimed_by',
'claim_time',
):
if hasattr(self, _attr):
try:
setattr(self, _attr, None)
except Exception:
# ignore if attribute is read-only
continue
# persist the updated state if a persistence method is available
for _method_name in ('save', 'commit', 'update'):
_method = getattr(self, _method_name, None)
if callable(_method):
try:
_method()
except Exception:
# do not let persistence failures break send()
pass
break

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +25
def _verify_runner_token(req):
"""Verify the runner token from request headers or params."""
token = req.headers.get('X-Runner-Token', '')
config = Submission.config()
for sb in config.sandbox_instances:
if secrets.compare_digest(token, sb.token):
return True
return False
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says the runner token is read from “headers or params”, but the implementation only checks X-Runner-Token in headers. Either update the docstring or also accept a query param (being careful about token leakage in logs/URLs if you do).

Copilot uses AI. Check for mistakes.
runner_name = _get_runner_name(request)
if sub.claimed_by != runner_name:
return HTTPError('not claimed by this runner', 403)

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Authorization for code/testdata/heartbeat/complete is effectively based on X-Runner-Name matching claimed_by. Any runner with a valid X-Runner-Token can spoof X-Runner-Name and access another runner’s claimed job. Consider binding the claim to a server-generated secret (e.g., store a claim token alongside claimed_by and require it on subsequent requests) rather than trusting a caller-controlled name header.

Suggested change
# Additionally require the server-generated job token issued at claim time
job_token = request.args.get('token')
# If the token is missing or does not match the submission's token, deny access
if not job_token or getattr(sub, "token", None) != job_token:
return HTTPError('invalid or missing job token', 403)

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +99
# Atomic conditional update to prevent race conditions
updated = engine.Submission.objects(
Q(id=submission_id) & Q(status=-1)
& (Q(claimed_by=None) | Q(claimed_at=None)
| Q(claimed_at__lte=claim_deadline)), ).update_one(
set__claimed_by=runner_name,
set__claimed_at=now,
)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claim_job() uses Q(id=submission_id) without handling invalid ObjectId formats. For malformed IDs, MongoEngine typically raises a ValidationError/InvalidId which would become a 500. Consider validating submission_id up front (or catching the relevant exception) and returning a 400.

Copilot uses AI. Check for mistakes.
Comment on lines +161 to +177
try:
resp = minio_client.client.get_object(
minio_client.bucket,
sub.code_minio_path,
)
data = resp.read()
finally:
if 'resp' in locals():
resp.close()
resp.release_conn()
elif sub.code is not None and sub.code.grid_id is not None:
data = sub.code.read()
else:
return HTTPError('code not found', 404)

return send_file(
io.BytesIO(data),
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads the entire code zip into memory (data = resp.read()) before sending it. For large submissions this can spike memory and slow down responses. Prefer streaming the MinIO/GridFS file-like object to the client (or using a generator/streaming response) so the server doesn’t buffer the whole object.

Suggested change
try:
resp = minio_client.client.get_object(
minio_client.bucket,
sub.code_minio_path,
)
data = resp.read()
finally:
if 'resp' in locals():
resp.close()
resp.release_conn()
elif sub.code is not None and sub.code.grid_id is not None:
data = sub.code.read()
else:
return HTTPError('code not found', 404)
return send_file(
io.BytesIO(data),
fileobj = minio_client.client.get_object(
minio_client.bucket,
sub.code_minio_path,
)
elif sub.code is not None and sub.code.grid_id is not None:
# Use the GridFS-like file object directly instead of reading into memory
fileobj = sub.code
else:
return HTTPError('code not found', 404)
return send_file(
fileobj,

Copilot uses AI. Check for mistakes.
Comment on lines +184 to +223
@runner_api.route('/jobs/<submission_id>/testdata', methods=['GET'])
def get_job_testdata(submission_id: str):
"""
Runner downloads the testdata zip for a claimed job's problem.
"""
try:
sub = engine.Submission.objects.get(id=submission_id)
except engine.DoesNotExist:
return HTTPError('submission not found', 404)

runner_name = _get_runner_name(request)
if sub.claimed_by != runner_name:
return HTTPError('not claimed by this runner', 403)

problem = sub.problem
test_case = problem.test_case
if test_case.case_zip_minio_path is not None:
minio_client = MinioClient()
try:
resp = minio_client.client.get_object(
minio_client.bucket,
test_case.case_zip_minio_path,
)
data = resp.read()
finally:
if 'resp' in locals():
resp.close()
resp.release_conn()
elif test_case.case_zip is not None and test_case.case_zip.grid_id is not None:
data = test_case.case_zip.read()
else:
return HTTPError('testdata not found', 404)

return send_file(
io.BytesIO(data),
mimetype='application/zip',
as_attachment=True,
download_name=f'testdata-{problem.problem_id}.zip',
)

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /jobs/<submission_id>/testdata endpoint is new but isn’t covered by tests in tests/test_runner.py (there are tests for /jobs, /claim, /code, /heartbeat, /complete). Please add at least a happy-path test (claimer can download) and a 403 case (non-claimer denied), and ideally a 404 for missing submission/problem testdata.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants