Description
The current GitHub URL validation only checks for URL prefix (https://github.com/) but doesn't validate the actual repository format or prevent potentially malicious URLs.
Current Behavior
def is_github_repo(self):
return (self.input_path.startswith("https://github.com/") or
self.input_path.startswith("git@github.com:"))
Security Concerns
- No validation of repository name format
- Accepts any URL starting with
https://github.com/
- No protection against URL injection attacks
- Missing support for authentication options
Expected Behavior
- Strict validation of GitHub repository URL format
- Sanitization of repository names
- Clear error messages for invalid URLs
- Optional support for authentication tokens
Implementation Requirements
def _validate_github_url(self, url):
"""Validate and sanitize GitHub repository URL"""
import re
# Support both HTTPS and SSH formats
https_pattern = r'^https://github\.com/([a-zA-Z0-9._-]+)/([a-zA-Z0-9._-]+)(?:\.git)?/?$'
ssh_pattern = r'^git@github\.com:([a-zA-Z0-9._-]+)/([a-zA-Z0-9._-]+)(?:\.git)?$'
if re.match(https_pattern, url) or re.match(ssh_pattern, url):
return True
return False
Files Affected
codebase_to_text/codebase_to_text.py (lines 586-588, 567-570)
Acceptance Criteria
Test Cases to Add
- Valid:
https://github.com/user/repo, https://github.com/user/repo.git
- Invalid:
https://github.com/, https://github.com/user/, https://github.com/../malicious
Description
The current GitHub URL validation only checks for URL prefix (
https://github.com/) but doesn't validate the actual repository format or prevent potentially malicious URLs.Current Behavior
Security Concerns
https://github.com/Expected Behavior
Implementation Requirements
Files Affected
codebase_to_text/codebase_to_text.py(lines 586-588, 567-570)Acceptance Criteria
.gitsuffix in URLsTest Cases to Add
https://github.com/user/repo,https://github.com/user/repo.githttps://github.com/,https://github.com/user/,https://github.com/../malicious