CRITICAL: This is a self-contained server application, NOT a public Python module.
NO backward compatibility required for internal APIs:
- Internal methods in
webhook_server/libs/can change freely - Return types can change (e.g.,
Any→bool) - Method signatures can be modified without deprecation
Backward compatibility ONLY for:
- User-facing configuration files (
config.yaml,.github-webhook-server.yaml) - Configuration schema changes (must support old formats or provide migration)
- Webhook payload handling (must follow GitHub webhook spec)
Rationale: This server is deployed as a single application. All code is updated together. Internal refactoring is safe and encouraged.
CRITICAL: Eliminate unnecessary defensive programming overhead.
Philosophy:
- Server fails-fast on startup if critical dependencies are missing
- Required parameters in
__init__()are ALWAYS provided - Checking for None on required parameters is pure overhead
- Defensive checks ONLY acceptable for truly optional parameters
- Fail-fast is better than hiding bugs with fake data
-
Destructors (
__del__) - Can be called during failed initializationdef __del__(self): if hasattr(self, "logger"): self.logger.debug("Cleanup")
-
Optional Parameters - Parameter explicitly allows None
def get_data(self, owner: str | None = None): ...
-
Lazy Initialization - Attribute explicitly starts as None
self.client: SomeClient | None = None
-
Platform Constants - Constant may not exist on all platforms
if hasattr(os, "O_NOFOLLOW"): flags |= os.O_NOFOLLOW
-
External Libraries We DON'T Control Version Of - Library version truly unknown
- NOTE: Does NOT apply to dependencies in
pyproject.toml- we control those versions
- NOTE: Does NOT apply to dependencies in
# ❌ WRONG - config is required, ALWAYS provided
def some_method(self):
if self.config: value = self.config.get_value("key")
# ✅ CORRECT
def some_method(self):
value = self.config.get_value("key")We control these versions in pyproject.toml: PyGithub >=2.4.0, gql >=3.5.0
# ❌ WRONG - PyGithub >=2.4.0 guaranteed
if hasattr(self.rest_client, "close"): self.rest_client.close()
# ✅ CORRECT
self.rest_client.close()repository_data is ALWAYS set before handlers instantiate (fail-fast in GithubWebhook.process())
# ❌ WRONG - repository_data guaranteed by architecture
if hasattr(self.github_webhook, "repository_data"): ...
# ✅ CORRECT
collaborators = self.github_webhook.repository_data["collaborators"]GitHub webhook format is stable: user.node_id, user.type, sender always exist
# ❌ WRONG
if "node_id" in user_data: return user_data["node_id"]
# ✅ CORRECT - Let it fail if data is malformed
return user_data["node_id"] # KeyError = legitimate bug# ❌ WRONG - Use isinstance for type checking
if hasattr(pr, "some_attr"): ...
# ✅ CORRECT
pr_id = pr.node_id # Direct attribute accessNEVER return fake defaults to hide missing data:
# ❌ WRONG - Fake data hiding bugs
return "", 0, False, None, UserWrapper(None), [], {}
# ✅ CORRECT - Fail-fast
raise ValueError("Data not available")
raise KeyError("Required field missing")Example:
# ❌ WRONG
@property
def user(self):
if self._data and "user" in self._data:
return UserWrapper(self._data["user"])
return UserWrapper(None) # Fake empty user - HIDES BUGS
# ✅ CORRECT
@property
def user(self):
if self._data and "user" in self._data:
return UserWrapper(self._data["user"])
raise ValueError("No user data available - webhook incomplete")NO defensive checks needed for:
- Repository Data -
repository_dataALWAYS set before handlers instantiate - Webhook User Objects -
user.node_id,user.type,senderalways exist - PyGithub REST API - 🔴 CRITICAL: PyGithub is blocking - MUST wrap with
github_api_call()(provides retry with exponential backoff)
- Destructors (
__del__) - Optional parameters (
param: Type | None = None) - Lazy initialization (starts as None)
- Platform constants (
os.O_NOFOLLOW) - External libraries we don't control
- Required parameters in
__init__() - Known library versions (PyGithub >=2.4.0)
- Architecture guarantees (
repository_data) - Webhook payload fields (
user.node_id) - Type discrimination (use
isinstance())
- Code reviews catch violations
- Type hints match reality
- Prek hooks automate checks
- Zero tolerance for unnecessary defensive programming
FastAPI-based GitHub webhook server that automates repository management and pull request workflows.
Event-Driven Handler Architecture:
webhook_server/libs/handlers/contains specialized handlers- Handlers instantiated by main FastAPI app (
app.py) - Pattern:
__init__(github_webhook, ...)→process_event(event_data)
Configuration System:
webhook_server/libs/config.pymanages YAML-based configuration with schema validation- Global config at
/home/podman/data/config.yamlwith per-repository overrides via.github-webhook-server.yaml - Schema validation in
webhook_server/config/schema.yaml - Configuration reloaded per webhook event (no server restart needed)
GitHub API Integration:
webhook_server/libs/github_api.pyprovides coreGithubWebhookclass- Uses PyGithub (REST API v3) for all GitHub operations
- 🔴 CRITICAL: PyGithub is synchronous/blocking - MUST wrap with
github_api_call()fromwebhook_server.utils.github_retry(retries transient 500/502/503/504 errors with exponential backoff) - Supports multiple GitHub tokens with automatic failover
Log Viewer System:
webhook_server/web/log_viewer.pycontainsLogViewerController- Memory-optimized: Streaming/chunked processing (90% memory reduction)
- Real-time log streaming via WebSocket
uv sync
source .venv/bin/activate# Development
uv run entrypoint.py
# Production
WEBHOOK_SERVER_DATA_DIR=/path/to/data uv run entrypoint.py# Run all tests
uv run --group tests pytest -n auto
# With coverage (90% required)
uv run --group tests pytest -n auto --cov=webhook_serveruv run ruff format
uv run ruff check
uv run ruff check --fix
uv run mypy webhook_server/
uv run ruff check && uv run ruff format && uv run mypy webhook_server/uv run webhook_server/tests/test_schema_validator.py config.yaml
uv run pytest webhook_server/tests/test_config_schema.py -vclass SomeHandler:
def __init__(self, github_webhook: GithubWebhook, ...):
self.github_webhook = github_webhook
def process_event(self, event_data: dict) -> None:
# Validate event data
# Perform GitHub API operations via unified_api
# Log resultsCRITICAL: PyGithub is synchronous - ALL operations MUST use github_api_call() from webhook_server.utils.github_retry. This wraps asyncio.to_thread() with retry logic for transient GitHub API failures (HTTP 500/502/503/504).
-
Method calls - ALL trigger API calls:
.get_*(),.create_*(),.edit(),.update(),.add_to_*(),.remove_from_*()
-
Property accesses - MANY trigger API calls:
.draft,.mergeable,.state,.committer,.author,.permissions,.labels,.assignees- ANY property not in webhook payload
-
PaginatedList iteration - BLOCKS during iteration
-
Safe operations (don't block):
- Properties from webhook payload (
.number,.title,.body) - Already-fetched cached data (rare)
- Properties from webhook payload (
import asyncio
from webhook_server.utils.github_retry import github_api_call
# ✅ CORRECT - Wrap ALL method calls with github_api_call (includes retry)
await github_api_call(pull_request.create_issue_comment, "Comment", logger=self.logger, log_prefix=self.log_prefix)
await github_api_call(pull_request.add_to_labels, "label", logger=self.logger, log_prefix=self.log_prefix)
await github_api_call(repository.get_pull, number, logger=self.logger, log_prefix=self.log_prefix)
# ✅ CORRECT - Wrap ALL property accesses that may trigger API calls
is_draft = await github_api_call(lambda: pull_request.draft, logger=self.logger, log_prefix=self.log_prefix)
mergeable = await github_api_call(lambda: pull_request.mergeable, logger=self.logger, log_prefix=self.log_prefix)
labels = await github_api_call(lambda: list(pull_request.labels), logger=self.logger, log_prefix=self.log_prefix)
# ✅ CORRECT - Wrap PaginatedList iteration
commits = await github_api_call(lambda: list(pull_request.get_commits()), logger=self.logger, log_prefix=self.log_prefix)
for commit in commits:
await process_commit(commit)
# ✅ CORRECT - Concurrent operations
is_draft, mergeable, state = await asyncio.gather(
github_api_call(lambda: pull_request.draft, logger=self.logger, log_prefix=self.log_prefix),
github_api_call(lambda: pull_request.mergeable, logger=self.logger, log_prefix=self.log_prefix),
github_api_call(lambda: pull_request.state, logger=self.logger, log_prefix=self.log_prefix),
)
# ❌ WRONG - NEVER call PyGithub directly
pull_request.create_issue_comment("Comment") # BLOCKS!
is_draft = pull_request.draft # BLOCKS!
for commit in pull_request.get_commits(): ... # BLOCKS!
# ❌ WRONG - NEVER use raw asyncio.to_thread (no retry protection)
await asyncio.to_thread(pull_request.create_issue_comment, "Comment") # NO RETRY!Before accessing ANY PyGithub object:
- Is this a PyGithub object? → YES, it may block
- Calling a method? → DEFINITELY BLOCKS - wrap in
github_api_call() - Accessing a property? → MAY BLOCK - wrap in
github_api_call(lambda: obj.property, logger=self.logger, log_prefix=self.log_prefix) - Iterating PaginatedList? → BLOCKS - wrap in
github_api_call(lambda: list(...), logger=self.logger, log_prefix=self.log_prefix) - Webhook payload attribute? → Usually safe (
.number,.title) - Unsure? ALWAYS wrap in
github_api_call()
Why this is critical:
- PyGithub is synchronous - each operation blocks 100ms-2 seconds
- Blocking = frozen server (no other webhooks processed)
github_api_call()runs code in thread pool viaasyncio.to_thread(), keeps event loop responsive, and retries on transient GitHub API failures- NOT OPTIONAL - required for correct async operation
Impact of blocking:
- Single blocking call freezes entire server
- Incoming webhooks must wait
- Server appears unresponsive
- Rate limits hit faster
- Degraded user experience
async def add_pr_comment(self, pull_request: PullRequest, body: str) -> None:
await github_api_call(pull_request.create_issue_comment, body, logger=self.logger, log_prefix=self.log_prefix)
async def check_pr_status(self, pull_request: PullRequest) -> tuple[bool, bool, str]:
return await asyncio.gather(
github_api_call(lambda: pull_request.draft, logger=self.logger, log_prefix=self.log_prefix),
github_api_call(lambda: pull_request.mergeable, logger=self.logger, log_prefix=self.log_prefix),
github_api_call(lambda: pull_request.state, logger=self.logger, log_prefix=self.log_prefix),
)# In GithubWebhook.process() - after PR data, before handlers
self.repository_data: dict[str, Any] = await self.unified_api.get_comprehensive_repository_data(owner, repo)
# In handlers - use pre-fetched data
collaborators = self.github_webhook.repository_data['collaborators']['edges']Key principles:
- Fetch once per webhook, never per handler
- Fail-fast: Exception propagates → webhook aborts
- Type is
dict[str, Any], never| None(fail-fast guarantees)
Location: webhook_server/libs/github_api.py lines 534-570
Early exit conditions (no clone needed):
- Action != "completed" - Skip
createdaction - Can-be-merged with non-success conclusion - Primary optimization
elif self.github_event == "check_run":
action = self.hook_data.get("action", "")
if action != "completed":
return None
check_run_name = self.hook_data.get("check_run", {}).get("name", "")
check_run_conclusion = self.hook_data.get("check_run", {}).get("conclusion", "")
if check_run_name == CAN_BE_MERGED_STR and check_run_conclusion != SUCCESS_STR:
return None
await self._clone_repository(pull_request=pull_request)Benefits:
- 90-95% reduction in unnecessary cloning
- Saves 5-30 seconds per skipped clone
- Reduced resource usage
- Lower server load
Tests: webhook_server/tests/test_check_run_handler.py
from webhook_server.libs.config import Config
config = Config(repository="org/repo-name")
value = config.get_value("setting-name", default_value)from webhook_server.utils.helpers import get_logger_with_params
logger = get_logger_with_params(
name="component_name",
repository="org/repo",
hook_id="github-delivery-id"
)
logger.debug("Detailed technical information")
logger.info("General information")
logger.warning("Warning that needs attention")
logger.error("Error requiring investigation")
logger.exception("Error with full traceback") # Preferred over logger.error(..., exc_info=True)JSON-based logging for webhook execution tracking with thread-safe context using ContextVar.
Context Creation (app.py):
from webhook_server.utils.context import create_context
ctx = create_context(
hook_id=hook_id,
event_type="pull_request",
repository="org/repo",
action="opened",
sender="username",
api_user="github-api-user",
)Step Tracking:
from webhook_server.utils.context import get_context
ctx = get_context()
ctx.start_step("clone_repository", branch="main")
try:
await clone_repo()
ctx.complete_step("clone_repository", commit_sha="abc123")
except Exception as ex:
import traceback
ctx.fail_step("clone_repository", exception=ex, traceback_str=traceback.format_exc())Handler Usage:
class PullRequestHandler:
async def process_event(self, event_data: dict) -> None:
ctx = self.github_webhook.ctx
ctx.start_step("assign_reviewers", pr_number=123)
try:
await self.assign_reviewers(pr)
ctx.complete_step("assign_reviewers", reviewers_assigned=3)
except Exception as ex:
ctx.fail_step("assign_reviewers", exception=ex, traceback_str=traceback.format_exc())Log File Format:
- Location:
{config.data_dir}/logs/webhooks_YYYY-MM-DD.json - Format: Pretty-printed JSON (2-space indentation)
- Rotation: Daily based on UTC date
Log entry structure:
{
"hook_id": "github-delivery-id",
"event_type": "pull_request",
"pr": {"number": 968, "title": "Add new feature"},
"timing": {"started_at": "2026-01-05T10:30:00.123Z", "duration_ms": 7712},
"workflow_steps": {
"clone_repository": {"status": "completed", "duration_ms": 4823}
},
"token_spend": 4,
"success": true
}# ✅ CORRECT: Use logger.exception for automatic traceback
try:
await some_operation()
except Exception:
logger.exception("Failed to perform operation")
# ❌ WRONG: Don't use logger.error with exc_info=True
except Exception as ex:
logger.error(f"Failed: {ex}", exc_info=True)
# ✅ BETTER: Catch specific exceptions
except GithubException as ex:
logger.exception("GitHub API operation failed")
except asyncio.CancelledError:
logger.debug("Operation cancelled")
raise # Always re-raise CancelledErrorMANDATORY: All imports at top of files
- No imports in functions or try/except blocks
- Exception: TYPE_CHECKING imports can be conditional
- Prek hooks enforce this
MANDATORY: Complete type hints (mypy strict mode)
# ✅ CORRECT
async def process_pr(self, pull_request: PullRequest, reviewers: list[str]) -> None: ...
# ❌ WRONG
async def process_pr(self, pull_request, reviewers): ...MANDATORY: 90% code coverage required
- Check:
uv run --group tests pytest --cov=webhook_server - New code without tests fails CI
- Tests in
webhook_server/tests/
webhook_server/tests/
├── test_*.py # Unit and integration tests
├── manifests/ # Test configuration files
│ └── config.yaml
└── test_*_handler.py # Handler-specific testsfrom unittest.mock import AsyncMock, Mock
mock_api = AsyncMock()
mock_api.get_pull_request.return_value = mock_pr_data
with patch("asyncio.to_thread", side_effect=mock_to_thread):
# Note: Tests patch asyncio.to_thread since github_api_call delegates to it internally
result = await unified_api.get_pr_for_check_runs(owner, repo, number)TEST_GITHUB_TOKEN = "ghp_test1234..." # pragma: allowlist secret
@pytest.fixture
def mock_github_api():
mock = Mock()
mock.get_rate_limit.return_value = Mock(rate=Mock(remaining=5000))
return mock/logs/*) are unauthenticated
- Deploy only on trusted networks (VPN, internal network)
- Never expose to public internet
- Use reverse proxy with authentication for external access
- Logs contain sensitive data: tokens, webhook payloads, user information
- Store tokens in environment variables or secret management systems
- Use multiple tokens for rate limit distribution
- Never commit tokens to repository
- Mask sensitive data in logs (see
mask-sensitive-datain schema)
- Create handler file in
webhook_server/libs/handlers/ - Implement
__init__(self, github_webhook, ...)andprocess_event(event_data) - Use
self.github_webhook.unified_apifor GitHub operations - Add tests in
webhook_server/tests/test_*_handler.py - Update
app.pyto instantiate handler
- Edit
webhook_server/config/schema.yaml - Run
uv run pytest webhook_server/tests/test_config_schema.py -v - Update examples in
examples/config.yaml - Test with
uv run webhook_server/tests/test_schema_validator.py examples/config.yaml
External AI service integration for test recommendations via pr-test-oracle.
Schema: webhook_server/config/schema.yaml (test-oracle), configurable globally or per-repo
Comment command: /test-oracle (always works when configured, no trigger needed)
Module: webhook_server/libs/test_oracle.py - call_test_oracle() shared helper
Error handling:
- Health check failure: PR comment posted, continue flow
- Analyze errors: log only, no PR comment
- Never breaks webhook processing
AI-powered enhancements controlled by ai-features config (global or per-repo).
Schema: webhook_server/config/schema.yaml ($defs.ai-features)
Sub-features: conventional-title, resolve-cherry-pick-conflicts-with-ai
On AI CLI failure: Error is logged, flow continues without suggestion
Module: webhook_server/libs/ai_cli.py - shared AI CLI wrapper