Skip to content

Runfiles files under .venv/site-packages can double in size #3439

@shayanhoshyari

Description

@shayanhoshyari

🐞 bug report

Affected Rule

py_binary with

common --@rules_python//python/config_settings:bootstrap_impl=script
common --@rules_python//python/config_settings:venvs_site_packages=yes
common --@rules_python//python/config_settings:venvs_use_declare_symlink=yes

Is this a regression?

Hard to tell. I think it might be unintended behavior.

Description

For building docker images, it is very common to make .tar files.

Currently, tar.bzl (used for aspect_rules_py, rules_oci, ... family) does not support symlinks, but there an ongoing thread about it: bazel-contrib/tar.bzl#16

A suggestion by @rickeylev is to use the File.is_symlink field to determine if a file is a symlink and just write as-is as symlink and preserve the relative path.

I did implement a basic tar rule equivalent based on this idea at: #3388 (comment)

However, with current result, the torch .so files return File.is_symlink = False, so we will end up writing the file to the tar twice (once the external runfile and once the one under .venv), and it can double the size of the tar (image layer).

I can workaround this in my tar rule, but was wondering if it is intended behavior in first place.

🔬 Minimal Reproduction

https://github.com/shayanhoshyari/issue-reports/tree/main/rules_python/venv_tar

🔥 Exception or Error

No exception, but files like this are returning File.is_symlink = False

- path: bazel-out/darwin_arm64-fastbuild/bin/_test.venv/lib/python3.13/site-packages/torch/_C.cpython-313-darwin.so
- short_path: _test.venv/lib/python3.13/site-packages/torch/_C.cpython-313-darwin.so
- is_symlink = False
- target: /private/var/tmp/_bazel_hoshyari/0915edf9aa0fc6e009ba2664be204381/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/_test.venv/lib/python3.13/site-packages/torch/_C.cpython-313-darwin.so

Other files (that are just folders and don't have file resolution) correctly have is_symlink = True

- path: bazel-out/darwin_arm64-fastbuild/bin/_test.venv/lib/python3.13/site-packages/filelock
- short_path: _test.venv/lib/python3.13/site-packages/filelock
- is_symlink = True
- target: ../../../../../rules_python++pip+pypi_313_filelock_py3_none_any_d38e3048/site-packages/filelock

🌍 Your Environment

Operating System: MacOS Sonoma (same is on Ubuntu 22.04)
Output of bazel version: 8.4.2
Rules_python version: 1.7.0

More info

I tried #3440, and that seems to return is_symlink = True but have same folder structure. How I tried it: link

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions