Skip to content

Conversation

@aignas
Copy link
Collaborator

@aignas aignas commented Dec 6, 2025

Looking at the investigation in #3380, it seems that we are calling
the startswith many times and I wanted to see if it would be possible
to optimize how it is done.

I also realized that no matter what target we have, we will be calling
the function once with a __init__.py path and we can inline this case
as a separate if statement checking for equality instead, which Starlark
optimizer should understand better.

Before this PR for every executable target we would go through the
legacy_external_runfiles and "__init__.py".startswith("external") and
this PR eliminates this.

Related to #3380 and #3381

Looking at the investigation in bazel-contrib#3381, it seems that we are calling
the startswith many times and I wanted to see if it would be possible
to optimize how it is done.

I also realized that no matter what target we have, we will be calling
the function once with a `__init__.py` path and we can inline this case
as a separate if statement checking for equality instead, which Starlark
optimizer should understand better.

Before this PR for every executable target we would go through the
`legacy_external_runfiles and "__init__.py".startswith("external")` and
this PR eliminates this.

Related to bazel-contrib#3380 and bazel-contrib#3381
@aignas aignas requested a review from rickeylev as a code owner December 6, 2025 12:09
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @aignas, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a performance optimization by refactoring the path resolution logic within py_executable.bzl. The primary goal is to reduce redundant string operations, specifically startswith calls, when determining zip runfiles paths. This is achieved by separating the core path construction from legacy prefix handling and optimizing the common __init__.py case, leading to a more efficient and streamlined process for generating Python zip files.

Highlights

  • Refactoring _get_zip_runfiles_path: The _get_zip_runfiles_path function has been refactored into two distinct functions: a new, simpler _get_zip_runfiles_path for basic path construction, and a _get_zip_runfiles_path_legacy that encapsulates the logic for handling external runfiles and ../ prefixes.
  • Optimized __init__.py path handling: The handling of __init__.py paths has been optimized by inlining its specific case, allowing it to directly use the simpler path construction function and avoid unnecessary startswith checks that were previously performed for every executable target.
  • Reduced startswith calls: The change significantly reduces the number of startswith calls, particularly those related to legacy_external_runfiles and the external prefix, by isolating this logic within the new _get_zip_runfiles_path_legacy function.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors _get_zip_runfiles_path to optimize for the __init__.py case by avoiding unnecessary startswith checks. The logic is soundly split into a core path construction function and a legacy function that handles prefix stripping, which improves clarity. The changes are consistent with the description and appear correct. I have one minor suggestion to further simplify the new _get_zip_runfiles_path function to make it more DRY.

Comment on lines +901 to +906
# NOTE: Use "+" for performance
if workspace_name:
# NOTE: Use "+" for performance
return _ZIP_RUNFILES_DIRECTORY_NAME + "/" + workspace_name + "/" + path
else:
return _ZIP_RUNFILES_DIRECTORY_NAME + "/" + path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function can be slightly simplified to reduce code duplication and improve readability by building the path prefix incrementally. This change maintains the use of + for string concatenation for performance.

    # NOTE: Use "+" for performance
    prefix = _ZIP_RUNFILES_DIRECTORY_NAME + "/"
    if workspace_name:
        prefix += workspace_name + "/"
    return prefix + path

@aignas
Copy link
Collaborator Author

aignas commented Dec 6, 2025

@tobyh-canva, could you please check the profile for this PR? #3442

I suspect that there should be an improvement, but not sure how large. At least for setups that do not use zip builds and do not have the legacy "external" code path active it should eliminate all appearances of the startswith in the trace.

def map_zip_empty_filenames(list_paths_cb):
return [
_get_zip_runfiles_path(path, workspace_name, legacy_external_runfiles) + "="
# FIXME @aignas 2025-12-06: what kind of paths do we expect here? Will they
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the paths to match what you'd see for a file in runfiles.files.

e.g., if a file in the runfiles comes from an external repo (and would have a short path of external/repo/foo.py or ../repo/foo.py), then there would be an empty file with a similar short path.

@rickeylev rickeylev added this pull request to the merge queue Dec 7, 2025
Merged via the queue into bazel-contrib:main with commit 9559b20 Dec 7, 2025
4 checks passed
@aignas aignas deleted the exp.aignas.get_zip_runfiles_path branch December 7, 2025 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants