Skip to content

feat: add safe gate privacy checks#206

Merged
ProfRandom92 merged 1 commit into
mainfrom
feat/safe-gate-privacy-checks
May 22, 2026
Merged

feat: add safe gate privacy checks#206
ProfRandom92 merged 1 commit into
mainfrom
feat/safe-gate-privacy-checks

Conversation

@ProfRandom92
Copy link
Copy Markdown
Owner

Adds minimal deterministic privacy/secret boundary checks to safe_pr_gate.

Includes:

  • risky path detection for .env, *.pem, *.key, id_rsa, id_ed25519
  • text-only marker checks for BEGIN PRIVATE KEY, GITHUB_TOKEN=, OPENAI_API_KEY=, GEMINI_API_KEY=
  • binary and missing/deleted file skipping
  • deterministic JSON output with stable problem ordering
  • focused safe_pr_gate tests

Scope

Local-only boundary checks. Not a broad secret scanner. No network, no external APIs, no binary scanning, and no broad refactors.

Validation

  • python -m compileall -q scripts/safe_pr_gate.py
  • pytest tests/test_safe_pr_gate.py -q

@ProfRandom92 ProfRandom92 merged commit 4487814 into main May 22, 2026
7 checks passed
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces privacy-focused security checks to the PR gate script, identifying risky file paths (e.g., .env, SSH keys) and sensitive content markers (e.g., API keys). Feedback from the review identifies a logic error where deleted files would still trigger warnings, potentially blocking the removal of secrets. Additionally, refactoring is suggested to optimize path resolution and improve code readability by passing Path objects directly to internal helper functions.

Comment thread scripts/safe_pr_gate.py
Comment on lines +143 to +155
def _privacy_problems(changed_paths: tuple[str, ...]) -> tuple[str, ...]:
problems: list[str] = []
for path in sorted(changed_paths):
if _is_risky_path(path):
problems.append(f"privacy_risky_path:{path}")

text = _read_changed_text(path)
if text is None:
continue
for marker in PRIVATE_MARKERS:
if marker in text:
problems.append(f"privacy_marker:{marker}:{path}")
return tuple(problems)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation flags risky paths even if the file has been deleted in the PR. This is a significant issue as it blocks users from committing the removal of secrets or sensitive files. By moving the existence check to the start of the loop, we avoid false positives for deleted files and can pass the resolved Path to _read_changed_text to eliminate redundant system calls.

def _privacy_problems(changed_paths: tuple[str, ...]) -> tuple[str, ...]:
    problems: list[str] = []
    for path in sorted(changed_paths):
        repo_path = _repo_relative_path(path)
        if repo_path is None or not repo_path.is_file():
            continue

        if _is_risky_path(path):
            problems.append(f"privacy_risky_path:{path}")

        text = _read_changed_text(repo_path)
        if text is None:
            continue
        for marker in PRIVATE_MARKERS:
            if marker in text:
                problems.append(f"privacy_marker:{marker}:{path}")
    return tuple(problems)

Comment thread scripts/safe_pr_gate.py
Comment on lines +127 to +140
def _read_changed_text(path: str) -> str | None:
candidate = _repo_relative_path(path)
if candidate is None or not candidate.is_file():
return None
try:
data = candidate.read_bytes()
except OSError:
return None
if b"\0" in data:
return None
try:
return data.decode("utf-8")
except UnicodeDecodeError:
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Refactor _read_changed_text to accept a Path object directly. This avoids redundant path resolution and existence checks when called from _privacy_problems. The simplified error handling also makes the function more readable while maintaining the same logic for binary and encoding detection.

Suggested change
def _read_changed_text(path: str) -> str | None:
candidate = _repo_relative_path(path)
if candidate is None or not candidate.is_file():
return None
try:
data = candidate.read_bytes()
except OSError:
return None
if b"\0" in data:
return None
try:
return data.decode("utf-8")
except UnicodeDecodeError:
return None
def _read_changed_text(candidate: Path) -> str | None:
try:
data = candidate.read_bytes()
if b"\0" in data:
return None
return data.decode("utf-8")
except (OSError, UnicodeDecodeError):
return None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant