feat: add pull-based runner API (GitHub Actions runner style) by as535364 · Pull Request #325 · Normal-OJ/Back-End

as535364 · 2026-03-23T13:42:58Z

Add a new /runner API blueprint that allows sandbox instances to operate as pull-based runners instead of receiving pushed submissions. Runners poll for pending jobs, claim them, download code/testdata, and report results back — similar to GitHub Actions self-hosted runners.

Changes:

model/runner.py: New runner API with /jobs, /claim, /code, /testdata, /complete, and /heartbeat endpoints
mongo/engine.py: Add claimed_by/claimed_at fields to Submission
mongo/submission.py: Support RUNNER_MODE=pull env var in send()
app.py: Register runner_api blueprint at /runner

https://claude.ai/code/session_01CD3q84rDaDXMnAmQVh58qR

Add a new /runner API blueprint that allows sandbox instances to operate as pull-based runners instead of receiving pushed submissions. Runners poll for pending jobs, claim them, download code/testdata, and report results back — similar to GitHub Actions self-hosted runners. Changes: - model/runner.py: New runner API with /jobs, /claim, /code, /testdata, /complete, and /heartbeat endpoints - mongo/engine.py: Add claimed_by/claimed_at fields to Submission - mongo/submission.py: Support RUNNER_MODE=pull env var in send() - app.py: Register runner_api blueprint at /runner https://claude.ai/code/session_01CD3q84rDaDXMnAmQVh58qR

21 tests covering all runner endpoints: - Auth (token validation, 403 on missing/invalid token) - GET /runner/jobs (pending jobs, claimed exclusion, expired claims) - POST /runner/jobs/<id>/claim (success, 404, 409, field updates) - GET /runner/jobs/<id>/code (download by claimer, 403 for others) - POST /runner/heartbeat (extend claim, no-op without id, 403) - PUT /runner/jobs/<id>/complete (missing body, missing fields, invalid token, full flow with result processing) https://claude.ai/code/session_01CD3q84rDaDXMnAmQVh58qR

https://claude.ai/code/session_01CD3q84rDaDXMnAmQVh58qR

Copilot

Pull request overview

Introduces a new pull-based “runner” API (GitHub Actions-style) so sandbox/runner instances can poll for pending submissions, claim jobs, download code/test data, and report results back.

Changes:

Add runner_api blueprint with job polling/claiming, artifact download, completion, and heartbeat endpoints.
Extend Submission with claimed_by / claimed_at fields to support pull-runner job leasing.
Add RUNNER_MODE=pull behavior to queue submissions for runner pickup instead of pushing to a sandbox.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
tests/test_runner.py	Adds end-to-end tests for the new runner API flows (auth, jobs, claim, code download, heartbeat, completion).
mongo/submission.py	Adds pull-mode support in `Submission.send()` controlled by `RUNNER_MODE`.
mongo/engine.py	Adds `claimed_by` / `claimed_at` fields to the `Submission` document.
model/runner.py	Implements the new `/runner` API blueprint and endpoints.
model/init.py	Exposes the new runner module via package imports/`__all__`.
app.py	Registers `runner_api` under the `/runner` URL prefix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T06:05:30Z

model/runner.py

+    # Check if already claimed by an active runner
+    if (sub.claimed_by is not None and sub.claimed_at is not None
+            and sub.claimed_at > claim_deadline):
+        return HTTPError('already claimed by another runner', 409)
+
+    runner_name = _get_runner_name(request)
+
+    # Claim the job
+    sub.update(
+        claimed_by=runner_name,
+        claimed_at=now,
+    )


claim_job is subject to a race condition: it reads sub, checks whether it’s currently claimed, then calls sub.update(...). Two runners hitting this endpoint concurrently can both pass the check and both end up believing they claimed the job. This should be done as an atomic conditional update (e.g., Submission.objects(id=..., status=-1, (claimed_by=None) | (claimed_at<=deadline)).update_one(...) and verify exactly one document was modified) and return 409 otherwise.

Copilot · 2026-03-24T06:05:30Z

model/runner.py

+def _verify_runner_token(req):
+    """Verify the runner token from request headers or params."""
+    token = req.headers.get('X-Runner-Token') or req.args.get('token', '')
+    config = Submission.config()


Runner authentication currently accepts the runner token via query string (req.args.get('token')). Tokens in URLs are easy to leak via logs, browser history, proxies, and referers. Prefer accepting the token only via X-Runner-Token header (or at least gate the query param behind an explicit debug/testing mode).

Copilot · 2026-03-24T06:05:30Z

model/runner.py

+    return req.headers.get('X-Runner-Name', 'unknown')
+
+
+@runner_api.before_request
+def check_runner_auth():
+    if not _verify_runner_token(request):
+        return HTTPError('invalid runner token', 403)


_get_runner_name defaults to 'unknown'. If a runner forgets to send X-Runner-Name, all such runners will share the same identity, allowing one to download/heartbeat/complete another’s job (since checks compare to claimed_by). Consider requiring X-Runner-Name to be present (400 if missing) and/or deriving a stable runner identity from the authenticated token instead of a user-supplied string.

Suggested change

return req.headers.get('X-Runner-Name', 'unknown')

@runner_api.before_request

def check_runner_auth():

if not _verify_runner_token(request):

return HTTPError('invalid runner token', 403)

name = req.headers.get('X-Runner-Name')

# Treat missing or empty header as no runner name

if not name:

return None

return name

@runner_api.before_request

def check_runner_auth():

if not _verify_runner_token(request):

return HTTPError('invalid runner token', 403)

# Require a non-empty runner name to avoid shared default identities

if not _get_runner_name(request):

return HTTPError('missing runner name', 400)

Copilot · 2026-03-24T06:05:30Z

model/runner.py

+
+    try:


on_job_complete verifies only the per-submission token; it does not verify that the caller is the runner that claimed the job (claimed_by). That means any holder of the runner token + submission token can complete the job, even if they didn’t claim it. Add a check that sub.claimed_by == _get_runner_name(request) (and possibly that the claim hasn’t expired) before accepting results.

Suggested change

try:

# Ensure that the runner completing the job is the same runner

# that currently holds the claim on this submission.

try:

sub = engine.Submission.objects.get(id=submission_id)

except engine.DoesNotExist:

return HTTPError('submission not found', 404)

runner_name = _get_runner_name(request)

if sub.claimed_by != runner_name:

return HTTPError('not claimed by this runner', 403)

try:

Copilot · 2026-03-24T06:05:31Z

model/runner.py

+    data = request.json
+    if data is None:
+        return HTTPError('missing JSON body', 400)
+
+    tasks = data.get('tasks')
+    token = data.get('token')
+    if tasks is None or token is None:
+        return HTTPError('missing tasks or token', 400)
+


This endpoint hand-rolls JSON parsing/validation (data = request.json, manual missing-field checks). Elsewhere in the codebase (e.g. model/submission.py), request validation uses the @Request.json(...) decorator which consistently handles missing/invalid content types and required fields. Consider switching this route to the shared request-validation helpers to keep behavior consistent and reduce duplication.

Copilot · 2026-03-24T06:05:31Z

model/runner.py

+        return HTTPError('not claimed by this runner', 403)
+
+    problem = sub.problem
+    submission = Submission(submission_id)


submission = Submission(submission_id) is created but never used in get_job_testdata, which makes the handler harder to read and suggests a missing check. Remove the unused variable (or use it if there’s intended logic such as token/permission checks).

Suggested change

submission = Submission(submission_id)

Copilot · 2026-03-24T06:05:31Z

tests/test_runner.py

+@pytest.fixture(autouse=True, scope='session')
+def setup_minio():
+    """Override session-scoped minio fixture to start minio via Docker."""
+    container_name = 'test-minio-runner'
+    subprocess.run(
+        ['docker', 'rm', '-f', container_name],
+        capture_output=True,
+    )
+    proc = subprocess.run(
+        [
+            'docker',
+            'run',
+            '-d',
+            '--name',
+            container_name,
+            '-p',
+            '19000:9000',
+            '-e',
+            'MINIO_ROOT_USER=minioadmin',
+            '-e',
+            'MINIO_ROOT_PASSWORD=minioadmin',
+            'minio/minio:latest',
+            'server',
+            '/data',
+            '--address',
+            ':9000',
+        ],
+        capture_output=True,
+        text=True,
+    )
+    assert proc.returncode == 0, f'Failed to start minio: {proc.stderr}'
+
+    # Wait for minio to be ready
+    for _ in range(30):
+        try:
+            import urllib.request
+            urllib.request.urlopen('http://localhost:19000/minio/health/live',
+                                   timeout=2)
+            break
+        except Exception:
+            _time.sleep(1)
+    else:
+        raise TimeoutError('Minio did not start in time')
+
+    mongo.config.MINIO_ACCESS_KEY = 'minioadmin'
+    mongo.config.MINIO_SECRET_KEY = 'minioadmin'
+    mongo.config.MINIO_HOST = 'localhost:19000'
+    mongo.config.FLASK_DEBUG = True
+
+    client = Minio(
+        'localhost:19000',
+        access_key='minioadmin',
+        secret_key='minioadmin',
+        secure=False,
+    )
+    bucket = mongo.config.MINIO_BUCKET
+    if not client.bucket_exists(bucket):
+        client.make_bucket(bucket)
+    yield
+
+    subprocess.run(['docker', 'rm', '-f', container_name], capture_output=True)
+
+


This test module overrides the session-scoped setup_minio fixture from tests/conftest.py and starts MinIO via docker run on a fixed host port (19000) using minio/minio:latest. This makes tests non-deterministic (latest tag), more brittle (port conflicts), and requires a Docker CLI/daemon even though the repo already uses testcontainers.minio.MinioContainer with a pinned version in conftest.py. Prefer reusing the existing session fixture instead of overriding it here.

Suggested change

@pytest.fixture(autouse=True, scope='session')

def setup_minio():

"""Override session-scoped minio fixture to start minio via Docker."""

container_name = 'test-minio-runner'

subprocess.run(

['docker', 'rm', '-f', container_name],

capture_output=True,

)

proc = subprocess.run(

[

'docker',

'run',

'-d',

'--name',

container_name,

'-p',

'19000:9000',

'-e',

'MINIO_ROOT_USER=minioadmin',

'-e',

'MINIO_ROOT_PASSWORD=minioadmin',

'minio/minio:latest',

'server',

'/data',

'--address',

':9000',

],

capture_output=True,

text=True,

)

assert proc.returncode == 0, f'Failed to start minio: {proc.stderr}'

# Wait for minio to be ready

for _ in range(30):

try:

import urllib.request

urllib.request.urlopen('http://localhost:19000/minio/health/live',

timeout=2)

break

except Exception:

_time.sleep(1)

else:

raise TimeoutError('Minio did not start in time')

mongo.config.MINIO_ACCESS_KEY = 'minioadmin'

mongo.config.MINIO_SECRET_KEY = 'minioadmin'

mongo.config.MINIO_HOST = 'localhost:19000'

mongo.config.FLASK_DEBUG = True

client = Minio(

'localhost:19000',

access_key='minioadmin',

secret_key='minioadmin',

secure=False,

)

bucket = mongo.config.MINIO_BUCKET

if not client.bucket_exists(bucket):

client.make_bucket(bucket)

yield

subprocess.run(['docker', 'rm', '-f', container_name], capture_output=True)

Copilot · 2026-03-24T06:05:31Z

model/runner.py

+    pending = engine.Submission.objects(
+        status=-1,
+        language__ne=3,
+    ).order_by('last_send')
+
+    jobs = []
+    for sub in pending:
+        # Skip if actively claimed by another runner
+        if (sub.claimed_by is not None and sub.claimed_at is not None
+                and sub.claimed_at > claim_deadline):
+            continue


get_pending_jobs loads all pending submissions (status=-1) and then filters out actively-claimed jobs in Python. If the queue is large (or many jobs are claimed), this can scan a lot of documents per poll. Consider pushing the claim-expiry logic into the Mongo query (OR on claimed_by=None / claimed_at=None / claimed_at<=deadline) and apply .limit(10) at the query level.

Suggested change

pending = engine.Submission.objects(

status=-1,

language__ne=3,

).order_by('last_send')

jobs = []

for sub in pending:

# Skip if actively claimed by another runner

if (sub.claimed_by is not None and sub.claimed_at is not None

and sub.claimed_at > claim_deadline):

continue

# The claim-expiry logic is pushed into the Mongo query to avoid

# scanning a large number of documents in Python.

pending = engine.Submission.objects(

status=-1,

language__ne=3,

).filter(

__raw__={

'$or': [

{'claimed_by': None},

{'claimed_at': None},

{'claimed_at': {'$lte': claim_deadline}},

]

}

).order_by('last_send').limit(10)

jobs = []

for sub in pending:

- Use atomic conditional update (update_one) for job claiming to prevent race conditions between concurrent runners - Remove query string token fallback, accept only X-Runner-Token header - Verify claimed_by on job completion to prevent unauthorized result submission - Remove unused variable in testdata endpoint - Push claim-expiry filter into MongoDB query instead of filtering in Python https://claude.ai/code/session_01CD3q84rDaDXMnAmQVh58qR

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T12:34:47Z

mongo/submission.py

+        # Pull mode: just mark as pending, runners will pick it up
+        runner_mode = os.getenv('RUNNER_MODE', 'push')
+        if runner_mode == 'pull':


In pull mode, send() returns True without updating any persisted state, but the docstring says it “marks submission as pending for runners to pick up”. Since send() is invoked by rejudge() after setting status/last_send, this is a good place to also clear any existing runner claim fields (and/or ensure status=-1/last_send refreshed) so the job becomes immediately available to runners after rejudge.

Suggested change

# Pull mode: just mark as pending, runners will pick it up

runner_mode = os.getenv('RUNNER_MODE', 'push')

if runner_mode == 'pull':

# Pull mode: mark as pending, runners will pick it up

runner_mode = os.getenv('RUNNER_MODE', 'push')

if runner_mode == 'pull':

# refresh status / last_send so the job appears pending

if hasattr(self, 'status'):

# -1 is commonly used as "waiting/pending" in this codebase

try:

self.status = -1

except Exception:

# fail soft if status is read-only or incompatible

pass

if hasattr(self, 'last_send'):

try:

self.last_send = datetime.utcnow()

except Exception:

pass

# clear any existing runner-claim fields so runners can re-claim

for _attr in (

'runner_id',

'runner_claimed_at',

'runner_lease_expires_at',

'claimed_by',

'claim_time',

):

if hasattr(self, _attr):

try:

setattr(self, _attr, None)

except Exception:

# ignore if attribute is read-only

continue

# persist the updated state if a persistence method is available

for _method_name in ('save', 'commit', 'update'):

_method = getattr(self, _method_name, None)

if callable(_method):

try:

_method()

except Exception:

# do not let persistence failures break send()

pass

break

Copilot · 2026-03-24T12:34:47Z

model/runner.py

+def _verify_runner_token(req):
+    """Verify the runner token from request headers or params."""
+    token = req.headers.get('X-Runner-Token', '')
+    config = Submission.config()
+    for sb in config.sandbox_instances:
+        if secrets.compare_digest(token, sb.token):
+            return True
+    return False


Docstring says the runner token is read from “headers or params”, but the implementation only checks X-Runner-Token in headers. Either update the docstring or also accept a query param (being careful about token leakage in logs/URLs if you do).

Copilot · 2026-03-24T12:34:47Z

model/runner.py

+    runner_name = _get_runner_name(request)
+    if sub.claimed_by != runner_name:
+        return HTTPError('not claimed by this runner', 403)
+


Authorization for code/testdata/heartbeat/complete is effectively based on X-Runner-Name matching claimed_by. Any runner with a valid X-Runner-Token can spoof X-Runner-Name and access another runner’s claimed job. Consider binding the claim to a server-generated secret (e.g., store a claim token alongside claimed_by and require it on subsequent requests) rather than trusting a caller-controlled name header.

Suggested change

# Additionally require the server-generated job token issued at claim time

job_token = request.args.get('token')

# If the token is missing or does not match the submission's token, deny access

if not job_token or getattr(sub, "token", None) != job_token:

return HTTPError('invalid or missing job token', 403)

Copilot · 2026-03-24T12:34:47Z

model/runner.py

+    # Atomic conditional update to prevent race conditions
+    updated = engine.Submission.objects(
+        Q(id=submission_id) & Q(status=-1)
+        & (Q(claimed_by=None) | Q(claimed_at=None)
+           | Q(claimed_at__lte=claim_deadline)), ).update_one(
+               set__claimed_by=runner_name,
+               set__claimed_at=now,
+           )


claim_job() uses Q(id=submission_id) without handling invalid ObjectId formats. For malformed IDs, MongoEngine typically raises a ValidationError/InvalidId which would become a 500. Consider validating submission_id up front (or catching the relevant exception) and returning a 400.

Copilot · 2026-03-24T12:34:48Z

model/runner.py

+        try:
+            resp = minio_client.client.get_object(
+                minio_client.bucket,
+                sub.code_minio_path,
+            )
+            data = resp.read()
+        finally:
+            if 'resp' in locals():
+                resp.close()
+                resp.release_conn()
+    elif sub.code is not None and sub.code.grid_id is not None:
+        data = sub.code.read()
+    else:
+        return HTTPError('code not found', 404)
+
+    return send_file(
+        io.BytesIO(data),


This reads the entire code zip into memory (data = resp.read()) before sending it. For large submissions this can spike memory and slow down responses. Prefer streaming the MinIO/GridFS file-like object to the client (or using a generator/streaming response) so the server doesn’t buffer the whole object.

Suggested change

try:

resp = minio_client.client.get_object(

minio_client.bucket,

sub.code_minio_path,

)

data = resp.read()

finally:

if 'resp' in locals():

resp.close()

resp.release_conn()

elif sub.code is not None and sub.code.grid_id is not None:

data = sub.code.read()

else:

return HTTPError('code not found', 404)

return send_file(

io.BytesIO(data),

fileobj = minio_client.client.get_object(

minio_client.bucket,

sub.code_minio_path,

)

elif sub.code is not None and sub.code.grid_id is not None:

# Use the GridFS-like file object directly instead of reading into memory

fileobj = sub.code

else:

return HTTPError('code not found', 404)

return send_file(

fileobj,

Copilot · 2026-03-24T12:34:48Z

model/runner.py

+@runner_api.route('/jobs/<submission_id>/testdata', methods=['GET'])
+def get_job_testdata(submission_id: str):
+    """
+    Runner downloads the testdata zip for a claimed job's problem.
+    """
+    try:
+        sub = engine.Submission.objects.get(id=submission_id)
+    except engine.DoesNotExist:
+        return HTTPError('submission not found', 404)
+
+    runner_name = _get_runner_name(request)
+    if sub.claimed_by != runner_name:
+        return HTTPError('not claimed by this runner', 403)
+
+    problem = sub.problem
+    test_case = problem.test_case
+    if test_case.case_zip_minio_path is not None:
+        minio_client = MinioClient()
+        try:
+            resp = minio_client.client.get_object(
+                minio_client.bucket,
+                test_case.case_zip_minio_path,
+            )
+            data = resp.read()
+        finally:
+            if 'resp' in locals():
+                resp.close()
+                resp.release_conn()
+    elif test_case.case_zip is not None and test_case.case_zip.grid_id is not None:
+        data = test_case.case_zip.read()
+    else:
+        return HTTPError('testdata not found', 404)
+
+    return send_file(
+        io.BytesIO(data),
+        mimetype='application/zip',
+        as_attachment=True,
+        download_name=f'testdata-{problem.problem_id}.zip',
+    )
+


The /jobs/<submission_id>/testdata endpoint is new but isn’t covered by tests in tests/test_runner.py (there are tests for /jobs, /claim, /code, /heartbeat, /complete). Please add at least a happy-path test (claimer can download) and a 403 case (non-claimer denied), and ideally a 404 for missing submission/problem testdata.

claude added 3 commits March 23, 2026 12:31

style: apply YAPF formatting to pass CI lint check

f65e794

https://claude.ai/code/session_01CD3q84rDaDXMnAmQVh58qR

as535364 requested a review from Copilot March 24, 2026 06:01

Copilot started reviewing on behalf of as535364 March 24, 2026 06:01 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

as535364 requested a review from Copilot March 24, 2026 12:29

Copilot started reviewing on behalf of as535364 March 24, 2026 12:30 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

-        # Pull mode: just mark as pending, runners will pick it up
-        runner_mode = os.getenv('RUNNER_MODE', 'push')
-        if runner_mode == 'pull':
+        # Pull mode: mark as pending, runners will pick it up
+        runner_mode = os.getenv('RUNNER_MODE', 'push')
+        if runner_mode == 'pull':
+            # refresh status / last_send so the job appears pending
+            if hasattr(self, 'status'):
+                # -1 is commonly used as "waiting/pending" in this codebase
+                try:
+                    self.status = -1
+                except Exception:
+                    # fail soft if status is read-only or incompatible
+                    pass
+            if hasattr(self, 'last_send'):
+                try:
+                    self.last_send = datetime.utcnow()
+                except Exception:
+                    pass
+            # clear any existing runner-claim fields so runners can re-claim
+            for _attr in (
+                'runner_id',
+                'runner_claimed_at',
+                'runner_lease_expires_at',
+                'claimed_by',
+                'claim_time',
+            ):
+                if hasattr(self, _attr):
+                    try:
+                        setattr(self, _attr, None)
+                    except Exception:
+                        # ignore if attribute is read-only
+                        continue
+            # persist the updated state if a persistence method is available
+            for _method_name in ('save', 'commit', 'update'):
+                _method = getattr(self, _method_name, None)
+                if callable(_method):
+                    try:
+                        _method()
+                    except Exception:
+                        # do not let persistence failures break send()
+                        pass
+                    break

+    # Additionally require the server-generated job token issued at claim time
+    job_token = request.args.get('token')
+    # If the token is missing or does not match the submission's token, deny access
+    if not job_token or getattr(sub, "token", None) != job_token:
+        return HTTPError('invalid or missing job token', 403)

Conversation

as535364 commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants