Skip to content

fix: test fail on venv python3.12 due to 200MB being too small and la…#20

Merged
rdwj merged 5 commits into
fips-agents:mainfrom
eformat:fix-tests-python-312
May 14, 2026
Merged

fix: test fail on venv python3.12 due to 200MB being too small and la…#20
rdwj merged 5 commits into
fips-agents:mainfrom
eformat:fix-tests-python-312

Conversation

@eformat
Copy link
Copy Markdown
Contributor

@eformat eformat commented May 6, 2026

Summary

following the local development readme immediately fails

  • The 200MB RLIMIT_AS (virtual memory) limit is too tight for Python3.12 version.
  • pytest needs SANDBOX_SKIP_LANDLOCK=1 set so that importing sandbox.app doesn't lock down the filesystem. The env var already exists for exactly this purpose.

@rdwj
Copy link
Copy Markdown
Contributor

rdwj commented May 7, 2026

Thanks for digging in here — the Python 3.12 baseline-overhead issue is real, and the conftest.py fix is a clean way to handle the test-runner Landlock interaction (the setdefault is a nice touch since it lets developers override).

I want to flag one concern with the production default bump (200 → 512 MB) before we merge, and propose a small additional change.

Concern: cgroup vs. RLIMIT_AS inversion

The default chart sets the pod memory limit to 256Mi (chart/values.yaml:14-16), sized to be slightly above the old 200 MB RLIMIT_AS so subprocess RLIMIT_AS was always the binding constraint. With the new 512 MB default, the order inverts:

  • Old: subprocess RLIMIT_AS (200 MB) fires first → kills only the subprocess → pipeline.py:120 emits an oom_kill SecurityEvent from the MemoryError in stderr → caller gets a clean response with audit trail.
  • New: cgroup limit (~268 MB) fires first → kernel OOM-killer picks a victim in the pod (often the FastAPI parent) → whole pod restarts, dropping in-flight requests. No MemoryError reaches stderr, so no audit event is emitted — memory-exhaustion attacks become silent in the sandbox audit log.

This weakens the layered defense story: layer 3 (subprocess memory bound) becomes a no-op for the minimal profile in the default chart.

Proposed change

I measured the FastAPI parent RSS locally (Python 3.11/macOS, profile loaded, idle and during a typical /execute call): 52 MB steady. Allowing ~30 MB headroom for Python 3.12/Linux drift puts the parent at ~80 MB. To keep RLIMIT_AS as the binding constraint:

  1. chart/values.yaml:14-16 — bump pod memory limit:

    limits:
      cpu: 500m
      memory: 704Mi   # 512 MB subprocess RLIMIT_AS + ~80 MB FastAPI parent + ~145 MB safety margin

    (640Mi works in theory but only leaves ~80 MB margin, which is tight under transient spikes from audit-event serialization, response encoding, etc.)

  2. Doc + docstring sync — three references still cite the old 200 MB ceiling:

    • sandbox/executor.py:373 — docstring says "Defaults to 200 MB"
    • docs/informed-attacker-runner.md:259 — "Memory: 200MB RLIMIT_AS (minimal profile)..."
    • docs/informed-attacker-runner.md:388 — "...memory exhaustion is capped at 200MB by RLIMIT_AS"

Optionally, splitting into "test-runner fix" and "raise minimal profile memory ceiling" commits would make the production behavior change easier to spot in the log, but not required.

Adjacent finding (not on you)

Containerfile:31 runs uvicorn with no --limit-concurrency. Multiple concurrent /execute requests can each spawn a subprocess, so N × 512 MB will blow past any reasonable pod limit. This was already true at 200 MB (4 × 200 = 800 > 256Mi), so the existing chart implicitly assumes the caller serializes. I'll open a separate issue.

Follow-up (also not on you)

I'll open a second issue to track a Python-version-stable approach (measuring interpreter baseline VM at app startup and applying RLIMIT_AS as baseline + user_budget) so the same drift doesn't repeat on 3.13.

@eformat
Copy link
Copy Markdown
Contributor Author

eformat commented May 7, 2026

All four changes done:

  1. chart/values.yaml:16 — pod memory limit 256Mi → 704Mi
  2. sandbox/executor.py:373 — docstring "200 MB" → "512 MB"
  3. docs/informed-attacker-runner.md:259 — "200MB RLIMIT_AS" → "512MB RLIMIT_AS"
  4. docs/informed-attacker-runner.md:388 — "capped at 200MB" → "capped at 512MB"

@eformat
Copy link
Copy Markdown
Contributor Author

eformat commented May 7, 2026

just testing in openshift as well .. make sure looks good

@eformat
Copy link
Copy Markdown
Contributor Author

eformat commented May 7, 2026

● That confirms the fix works:

  • 600MB allocation exceeds the 512MB RLIMIT_AS → subprocess gets a clean MemoryError in stderr
  • The FastAPI parent is not killed by the cgroup OOM-killer (we got a proper JSON response back)
  • The pod stayed up (no restart)

Let me verify the pod didn't restart.

● Bash(oc get pods -n code-sandbox -l app.kubernetes.io/name=code-sandbox 2>&1)
⎿  NAME READY STATUS RESTARTS AGE
code-sandbox-776bf7d8f4-7xslp 1/1 Running 0 57s
code-sandbox-776bf7d8f4-lrzpr 1/1 Running 0 57s

● All green — 0 restarts. Here's what was validated:

  • Pod memory limit: 704Mi (was 256Mi)
  • Healthcheck: {"status":"ok"}
  • Code execution: clean output, exit_code 0
  • Memory exhaustion: 600MB allocation → subprocess MemoryError (RLIMIT_AS at 512MB fires first), pod stays up with 0 restarts and returns a proper JSON response with audit-trailable stderr

The cgroup vs RLIMIT_AS inversion is fixed.

Added new test for this as well:

test_default_rlimit_fires_before_cgroup

@rdwj
Copy link
Copy Markdown
Contributor

rdwj commented May 7, 2026

Almost there — the OpenShift validation and the new test_default_rlimit_fires_before_cgroup are excellent. One small math thing on the bump in test_memory_exhaustion:

The test asserts that the allocation should fail ("ALLOCATED:" not in result.stdout or result.exit_code != 0), and CI is showing it succeeds:

stdout='ALLOCATED: 419430400\n'
exit_code=0

That's correct behavior under the new defaults. Linux Python interpreter VM at startup is ~30–50 MB, plus a 400 MB bytearray = ~430–450 MB total VM, which is comfortably under the new 512 MB RLIMIT_AS. The allocation should succeed; the test asserting otherwise is the bug.

Two ways to fix:

Option A — bump the allocation above the limit so the test does its intended job:

async def test_memory_exhaustion(self):
    """Attempt to allocate 600MB in a memory-limited sandbox."""
    code = textwrap.dedent("""\
        x = bytearray(600 * 1024 * 1024)
        print('ALLOCATED:', len(x))
    """)

You already proved 600 MB triggers MemoryError in the new test, so this is known-good.

Option B — delete test_memory_exhaustion as redundant with the new test_default_rlimit_fires_before_cgroup. Your new test asserts a strictly stronger property ("MemoryError" in stderr) and exercises the same code path. Two tests proving the same thing isn't worth the maintenance.

Either works. I'd lean toward B since you've already replaced it with something better, but if you'd rather keep both as belt-and-suspenders that's fine too.

@eformat
Copy link
Copy Markdown
Contributor Author

eformat commented May 14, 2026

● All 60 passed. Deleted test_memory_exhaustion — the new test_default_rlimit_fires_before_cgroup covers it with a stronger assertion.

Copy link
Copy Markdown
Contributor

@rdwj rdwj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All feedback addressed — Option B was the right call. 60/60 tests green, upstream merged, RLIMIT_AS inversion fixed and validated on OpenShift. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants