Skip to content

Conversation

@qkaiser
Copy link
Contributor

@qkaiser qkaiser commented Jul 1, 2025

Draft implementation for #1216

It fails on test_archive_success because we cannot instantiate ExtractionConfig with an arbitrary tmp directory right now.

Some unit testing is also required for proper coverage.

@qkaiser qkaiser force-pushed the allow-tmp-access branch from 9b4616a to 0cad9c5 Compare July 1, 2025 12:21
@qkaiser qkaiser linked an issue Jul 1, 2025 that may be closed by this pull request
@qkaiser qkaiser self-assigned this Jul 1, 2025
@qkaiser qkaiser added enhancement New feature or request help wanted Extra attention is needed python Pull requests that update Python code labels Jul 1, 2025
@qkaiser qkaiser marked this pull request as draft July 1, 2025 12:22
"""Set environment variables so all subprocesses and handlers use our temp dir"""
for var in ("TMP", "TMPDIR", "TEMP", "TEMPDIR"):
os.environ[var] = self.tmp_dir.as_posix()
atexit.register(self._cleanup_tmp_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic for programs using unblob as library, as these variables remain set in the hosting process's environment. We should set and reset them in a tighter scope.

For example, Processor.process_task looks like a fine candidate, as it by default will spawn child processes, not poisoning the caller's environment at all (given they use process_num > 1).

Another possibility is doing this at process_file level, but it still affects the hosting process unfortunately, but at least we can restore the variables upon returning from the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also implemented tests in test_necessary_resources_can_be_created_in_sandbox

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the original comment being resolved in the code - below I can find the env variables being set, but not restored, and here atexit.register still being used.

Have you pushed your changes?

I am also not sure how it would landlock to a /tmp/call-specific-tempdir if used as a library on the second call with another ExtractionConfig.

One solution I can think of is process_file forking in case of landlock, setting the environment variables in the fork, and cleaning up the temporary directory when the fork exits. Since the environment variables would be set in the child, they need no restoring.

(actually multiprocessing should be used somehow, as https://docs.python.org/3/library/os.html#os.fork has many problems and known not to work on macos)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e3krisztian do I still need to do something here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implicit atexit-cleanup pattern still bothers me, and potentially anger provoking when the directory is unintentionally removed.

I would prefer a breaking change, except that is ruled out.

There is another solution: we could make the default '/tmp' (or the first value set for any of the TMP... variables), which should work backward compatibly. The only problem left is the cleanup, which would attempt to remove '/tmp' now.

For cleanup we could introduce another temp-dir for sub-processes:

  • make a default for config.tmp_dir based on TMP... variables with fallback to /tmp
  • use another sub-directory (created with mkdtemp(dir=config.tmp_dir)) of config.tmp_dir as temporary directory for running tasks, immediately cleaning up the sub-directory when exiting the task (we could add both the mkdtemp and the cleanup to the tmp_dir context manager)
  • do not clean up config.tmp_dir here: it is pre-existing, should be cleaned up at caller site if set explicitly (e.g. in unblob.cli.cli)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlaci raised a concern, that we should consider doing this only in a more limited context inside Command extractors, as only external commands use tempfiles.

@qkaiser qkaiser force-pushed the allow-tmp-access branch from 0cad9c5 to d6d9358 Compare July 2, 2025 05:19
@qkaiser qkaiser marked this pull request as ready for review July 2, 2025 05:20
qkaiser and others added 2 commits November 25, 2025 17:00
Move temporary directory creation and cleanup from ExtractionConfig to
CLI to make the lifecycle explicit and prevent unintended cleanup.

The implicit atexit cleanup in ExtractionConfig could unexpectedly
remove user-provided directories. This change moves that responsibility
to the CLI layer where it's visible and under explicit control.

Changes:
- ExtractionConfig.tmp_dir defaults to system temp (no auto-cleanup)
- Renamed tmp_dir() context manager to task_tmp_dir()
- task_tmp_dir() creates and cleans up task-specific subdirectories
- CLI explicitly creates unblob temp directory and manages cleanup
- Tests updated for proper sandbox isolation

Result: Library code has no implicit cleanup. CLI owns the lifecycle.
Users can safely provide their own tmp_dir without cleanup surprises.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request help wanted Extra attention is needed python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

landlock sandbox: allow access to /tmp

4 participants