fix: test fail on venv python3.12 due to 200MB being too small and la… by eformat · Pull Request #20 · fips-agents/code-sandbox

eformat · 2026-05-06T22:48:57Z

Summary

following the local development readme immediately fails

The 200MB RLIMIT_AS (virtual memory) limit is too tight for Python3.12 version.
pytest needs SANDBOX_SKIP_LANDLOCK=1 set so that importing sandbox.app doesn't lock down the filesystem. The env var already exists for exactly this purpose.

…ndlock overide

rdwj · 2026-05-07T03:06:14Z

Thanks for digging in here — the Python 3.12 baseline-overhead issue is real, and the conftest.py fix is a clean way to handle the test-runner Landlock interaction (the setdefault is a nice touch since it lets developers override).

I want to flag one concern with the production default bump (200 → 512 MB) before we merge, and propose a small additional change.

Concern: cgroup vs. RLIMIT_AS inversion

The default chart sets the pod memory limit to 256Mi (chart/values.yaml:14-16), sized to be slightly above the old 200 MB RLIMIT_AS so subprocess RLIMIT_AS was always the binding constraint. With the new 512 MB default, the order inverts:

Old: subprocess RLIMIT_AS (200 MB) fires first → kills only the subprocess → pipeline.py:120 emits an oom_kill SecurityEvent from the MemoryError in stderr → caller gets a clean response with audit trail.
New: cgroup limit (~268 MB) fires first → kernel OOM-killer picks a victim in the pod (often the FastAPI parent) → whole pod restarts, dropping in-flight requests. No MemoryError reaches stderr, so no audit event is emitted — memory-exhaustion attacks become silent in the sandbox audit log.

This weakens the layered defense story: layer 3 (subprocess memory bound) becomes a no-op for the minimal profile in the default chart.

Proposed change

I measured the FastAPI parent RSS locally (Python 3.11/macOS, profile loaded, idle and during a typical /execute call): 52 MB steady. Allowing ~30 MB headroom for Python 3.12/Linux drift puts the parent at ~80 MB. To keep RLIMIT_AS as the binding constraint:

chart/values.yaml:14-16 — bump pod memory limit:
```
limits:
  cpu: 500m
  memory: 704Mi   # 512 MB subprocess RLIMIT_AS + ~80 MB FastAPI parent + ~145 MB safety margin
```
(640Mi works in theory but only leaves ~80 MB margin, which is tight under transient spikes from audit-event serialization, response encoding, etc.)
Doc + docstring sync — three references still cite the old 200 MB ceiling:
- sandbox/executor.py:373 — docstring says "Defaults to 200 MB"
- docs/informed-attacker-runner.md:259 — "Memory: 200MB RLIMIT_AS (minimal profile)..."
- docs/informed-attacker-runner.md:388 — "...memory exhaustion is capped at 200MB by RLIMIT_AS"

Optionally, splitting into "test-runner fix" and "raise minimal profile memory ceiling" commits would make the production behavior change easier to spot in the log, but not required.

Adjacent finding (not on you)

Containerfile:31 runs uvicorn with no --limit-concurrency. Multiple concurrent /execute requests can each spawn a subprocess, so N × 512 MB will blow past any reasonable pod limit. This was already true at 200 MB (4 × 200 = 800 > 256Mi), so the existing chart implicitly assumes the caller serializes. I'll open a separate issue.

Follow-up (also not on you)

I'll open a second issue to track a Python-version-stable approach (measuring interpreter baseline VM at app startup and applying RLIMIT_AS as baseline + user_budget) so the same drift doesn't repeat on 3.13.

eformat · 2026-05-07T06:36:31Z

All four changes done:

chart/values.yaml:16 — pod memory limit 256Mi → 704Mi
sandbox/executor.py:373 — docstring "200 MB" → "512 MB"
docs/informed-attacker-runner.md:259 — "200MB RLIMIT_AS" → "512MB RLIMIT_AS"
docs/informed-attacker-runner.md:388 — "capped at 200MB" → "capped at 512MB"

eformat · 2026-05-07T06:38:24Z

just testing in openshift as well .. make sure looks good

…group

eformat · 2026-05-07T06:43:18Z

● That confirms the fix works:

600MB allocation exceeds the 512MB RLIMIT_AS → subprocess gets a clean MemoryError in stderr
The FastAPI parent is not killed by the cgroup OOM-killer (we got a proper JSON response back)
The pod stayed up (no restart)

Let me verify the pod didn't restart.

● Bash(oc get pods -n code-sandbox -l app.kubernetes.io/name=code-sandbox 2>&1)
⎿ NAME READY STATUS RESTARTS AGE
code-sandbox-776bf7d8f4-7xslp 1/1 Running 0 57s
code-sandbox-776bf7d8f4-lrzpr 1/1 Running 0 57s

● All green — 0 restarts. Here's what was validated:

Pod memory limit: 704Mi (was 256Mi)
Healthcheck: {"status":"ok"}
Code execution: clean output, exit_code 0
Memory exhaustion: 600MB allocation → subprocess MemoryError (RLIMIT_AS at 512MB fires first), pod stays up with 0 restarts and returns a proper JSON response with audit-trailable stderr

The cgroup vs RLIMIT_AS inversion is fixed.

Added new test for this as well:

test_default_rlimit_fires_before_cgroup

rdwj · 2026-05-07T13:00:55Z

Almost there — the OpenShift validation and the new test_default_rlimit_fires_before_cgroup are excellent. One small math thing on the bump in test_memory_exhaustion:

The test asserts that the allocation should fail ("ALLOCATED:" not in result.stdout or result.exit_code != 0), and CI is showing it succeeds:

stdout='ALLOCATED: 419430400\n'
exit_code=0

That's correct behavior under the new defaults. Linux Python interpreter VM at startup is ~30–50 MB, plus a 400 MB bytearray = ~430–450 MB total VM, which is comfortably under the new 512 MB RLIMIT_AS. The allocation should succeed; the test asserting otherwise is the bug.

Two ways to fix:

Option A — bump the allocation above the limit so the test does its intended job:

async def test_memory_exhaustion(self):
    """Attempt to allocate 600MB in a memory-limited sandbox."""
    code = textwrap.dedent("""\
        x = bytearray(600 * 1024 * 1024)
        print('ALLOCATED:', len(x))
    """)

You already proved 600 MB triggers MemoryError in the new test, so this is known-good.

Option B — delete test_memory_exhaustion as redundant with the new test_default_rlimit_fires_before_cgroup. Your new test asserts a strictly stronger property ("MemoryError" in stderr) and exercises the same code path. Two tests proving the same thing isn't worth the maintenance.

Either works. I'd lean toward B since you've already replaced it with something better, but if you'd rather keep both as belt-and-suspenders that's fine too.

eformat · 2026-05-14T06:58:27Z

● All 60 passed. Deleted test_memory_exhaustion — the new test_default_rlimit_fires_before_cgroup covers it with a stronger assertion.

rdwj

All feedback addressed — Option B was the right call. 60/60 tests green, upstream merged, RLIMIT_AS inversion fixed and validated on OpenShift. Merging.

fix: test fail on venv python3.12 due to 200MB being too small and la…

ffa8f7d

…ndlock overide

rdwj mentioned this pull request May 7, 2026

Add .fips-template.yaml to opt into fips-agents patch flow #21

Merged

4 tasks

rdwj mentioned this pull request May 7, 2026

chore: Gitignore NEXT_SESSION.md and remove from tracking #22

Merged

update: python3.12 fixes

b331556

update: python3.12 fixes - add new test_default_rlimit_fires_before_c…

33b3e4a

…group

eformat added 2 commits May 14, 2026 16:49

Merge remote-tracking branch 'upstream/main' into fix-tests-python-312

a5c7f0b

remove: redundant test

6a424ec

rdwj approved these changes May 14, 2026

View reviewed changes

rdwj merged commit 5a93e24 into fips-agents:main May 14, 2026

This was referenced May 14, 2026

uvicorn missing --limit-concurrency allows concurrent requests to exceed pod memory #23

Open

Measure interpreter baseline VM at startup for version-stable RLIMIT_AS #24

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: test fail on venv python3.12 due to 200MB being too small and la…#20

fix: test fail on venv python3.12 due to 200MB being too small and la…#20
rdwj merged 5 commits into
fips-agents:mainfrom
eformat:fix-tests-python-312

eformat commented May 6, 2026

Uh oh!

rdwj commented May 7, 2026

Uh oh!

eformat commented May 7, 2026

Uh oh!

eformat commented May 7, 2026

Uh oh!

eformat commented May 7, 2026

Uh oh!

rdwj commented May 7, 2026

Uh oh!

eformat commented May 14, 2026

Uh oh!

rdwj left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eformat commented May 6, 2026

Summary

Uh oh!

rdwj commented May 7, 2026

Concern: cgroup vs. RLIMIT_AS inversion

Proposed change

Adjacent finding (not on you)

Follow-up (also not on you)

Uh oh!

eformat commented May 7, 2026

Uh oh!

eformat commented May 7, 2026

Uh oh!

eformat commented May 7, 2026

Uh oh!

rdwj commented May 7, 2026

Uh oh!

eformat commented May 14, 2026

Uh oh!

rdwj left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants