Skip to content

RFC: Generate minimal profile by observing a workload (sandlock learn) #72

@congwang-mk

Description

@congwang-mk

Goal

Add a sandlock learn -- <cmd> mode that runs a workload under instrumentation and emits a minimal sandlock profile (TOML) covering exactly the filesystem reads/writes, network egress, and syscalls the workload actually used. Subsequent sandlock run -p <profile> invocations confine the workload to just that observed surface.

Motivation

Writing a tight policy from scratch is the single largest UX gap with the flag-based interface. To run a Python script under sandlock today, the user has to know which directories cpython, site-packages, ssl certs, locale archives, and tempdirs live in. Most users give up and write -r /usr -r /lib -r /lib64 -r /etc -w /tmp, which is wide enough to be close to no policy. The result is a sandbox in name only.

The XOA model assumes per-call confinement is tight enough to matter. If the per-call profile is permissive by default, the threat model degrades to "container without a container," which is worse than what we promise.

The way out is observation. Run the workload once, record what it actually touches, emit a profile. The user starts from "definitely works, definitely minimal" rather than "guess and iterate."

Proposed design

Command surface

sandlock learn -o profile.toml -- python3 build.py
sandlock learn --merge profile.toml -- python3 build.py   # union into existing
sandlock run -p profile.toml -- python3 build.py

What is recorded

Domain Recording mechanism
Filesystem reads Permissive Landlock + seccomp-notify on openat/open
Filesystem writes Same; classified by open flags (O_WRONLY / O_RDWR / O_CREAT)
Network egress (TCP / UDP / ICMP) seccomp-notify on connect / sendto / sendmsg
HTTP method + host + path Existing transparent proxy with --http-ca, in logging-only mode
Syscalls seccomp filter counts unique syscalls invoked
Resource peaks /proc/<pid> sampling: max RSS, max threads, max FDs

Output format

Reuse the existing TOML profile serializer in crates/sandlock-core/src/profile.rs. Fields populated:

  • fs.readable / fs.writable: minimal path prefixes covering observed accesses, collapsed to directory granularity (see below).
  • net.allow: observed host:port pairs, with scheme prefix when non-TCP.
  • http.allow: observed method+host+path rules. Optional, gated by --learn-http.
  • seccomp.allow: minimal syscall set, gated by --learn-syscalls; otherwise omit and rely on the default profile.
  • limits.max_memory / limits.max_processes: observed peak times a safety factor (default 1.5x, configurable).

The output carries a header comment with the input command, host kernel, and timestamp, so reproduction is unambiguous.

Path collapsing

Recording one entry per file is unworkable: a Python import touches thousands. The collapser groups by directory using a tunable heuristic:

  • If the workload touched at least N files (default 4) under a directory, allowlist the directory.
  • If fewer, allowlist individual files.
  • --collapse N and --collapse-prefix /usr/lib/python3 force aggregation.

Merging and iteration

--merge profile.toml performs a union: existing rules retained, new rules added; resource caps take the max of old vs observed. Iterative refinement is the expected workflow: run with one input, run with another, merge.

When a later sandlock run -p hits a denial, the seccomp/Landlock log line should suggest sandlock learn --merge profile.toml -- ... to extend the profile.

Open questions

  1. Permissive-Landlock + notify vs notify-only. Landlock has no native audit mode. Either accept denials and observe them, or run with rules fully permissive and observe via seccomp-notify on file syscalls. Latter is heavier per-syscall but complete. Decide during prototype.
  2. eBPF as alternative recorder. A bcc/bpftrace tracer would be much faster than seccomp-notify but adds a build dependency. Out of scope for v1; revisit if seccomp-notify overhead is prohibitive on realistic workloads.
  3. Per-invocation log vs aggregate. Probably emit both: the aggregated profile is the artifact, a side-channel --debug-log records every observation for diagnosing the collapser.
  4. Multi-process workloads. Forks inherit the seccomp filter; the supervisor already aggregates across children for runtime sandboxes. Same machinery applies; verify in prototype.

Out of scope for v1

  • Replay / fuzzing across input variants to broaden the trace.
  • ML-guided rule generalization.
  • Auto-tightening: take an existing profile, identify rules that no recorded run actually exercised, suggest removal. Useful follow-on once learn lands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions