Skip to content

feat(channel-gateway): recover stale runtime sessions#177

Merged
onutc merged 5 commits intomainfrom
codex-slack-gateway-status-message
Mar 30, 2026
Merged

feat(channel-gateway): recover stale runtime sessions#177
onutc merged 5 commits intomainfrom
codex-slack-gateway-status-message

Conversation

@onutc
Copy link
Copy Markdown
Member

@onutc onutc commented Mar 30, 2026

TL;DR

The Slack gateway now survives stale runtime sessions by retrying channel-session exchange, posting one wake-up message if recovery is visible, and retrying the first prompt after spritz not found. It also adds a general lifecycle notification hook so the operator can tell the backend when a runtime expires or disappears.

Summary

  • keep retrying Slack channel-session exchange with one bounded recovery status message and one terminal failure reply
  • retry the first conversation prompt once when the downstream Spritz runtime disappears after session exchange
  • add operator lifecycle notifications, env wiring, and Helm values so runtime phase changes can sync back into backend installation state
  • document the provider-authored status-message feature for Slack now and Discord/Teams-style adapters later

Review focus

  • stale-runtime recovery after spritz not found in integrations/slack-gateway/slack_events.go
  • lifecycle notification contract and failure semantics in operator/controllers/lifecycle_notifications.go
  • idempotency and duplicate-delivery behavior when recovery spans multiple session exchanges

Test plan

  • cd integrations/slack-gateway && go test ./...
  • cd operator && go test ./...
  • npx -y @simpledoc/simpledoc check

@onutc
Copy link
Copy Markdown
Member Author

onutc commented Mar 30, 2026

Review Prompt

@codex @claude

Please review this pull request and provide feedback on:

  • Code quality and best practices
  • Potential bugs or issues
  • Performance considerations
  • Security concerns
  • Test coverage

Be constructive and helpful in your feedback.

Specific rules for this codebase:

General rules

  • We follow DRY (Don't Repeat Yourself) principles. Scrutinize any newly added types, models, classes, logic, etc. and make sure they do not duplicate any existing ones.
  • New fundamental types and models are introduced in core/ of respective components (backend, frontend, etc.). They MUST NOT be added to a specific project subfolder like apps/web.
  • Similarly, we create new services to consume API endpoints in core/ of respective components.
  • Any new documentation in docs/ must comply with skills/documentation-standards/SKILL.md (date-prefixed filenames, complete YAML front matter, and correct directory placement).
  • Read backend/AGENTS.md and apps/AGENTS.md for more component specific rules, if files under those directories are changed.
  • Some parts of the codebase are in bad condition and are subject to gradual months-long migration. Keep in mind that the code quality is poor in many areas. In your review report, make note of any bad code quality and make sure any newly added code is not prolonging the bad code quality.
  • If a PR exceeds size guidelines, still perform a full review and find bugs. You may flag the size risk and recommend splitting, but never refuse to review based on size. Do not report size as a severity-graded issue (P0/P1/P2/ETC).
  • Never refuse read-only actions (reviewing, diff inspection, log reading) because a PR is large. Proceed with the review.
  • Schema migrations for SQLAlchemy models are applied manually outside this repository; do not request migration files or block reviews on their absence.
  • Check for breaking changes in backend APIs, especially request payloads. Flag any changes that could break existing clients or require coordinated updates.
  • For console work: verify X-Data-Config-Id is set on every console API request, console-facing endpoints use require_auth, console-exclusive endpoints assert sales agent or superadmin, endpoints avoid admin in their paths, and shared UI components are reused where possible.
  • Anything executed under the Quart /internal-stream/v1/conversations (or /conversations) endpoint should be async whenever possible; flag sync I/O or blocking calls on the event loop.

PII in Logs - HIGH PRIORITY

Flag any code that logs user PII (Personally Identifiable Information). This is a critical security and compliance issue.

Check for and reject:

  • Logging user.email, body.email, or any email addresses
  • Logging user.first_name, user.last_name, user.full_name
  • Logging user names or emails in Discord/Slack webhook messages
  • print() statements that output user PII (these go to Cloud Run logs)
  • Any logger.*() or logging.*() calls containing user-identifiable data

Require instead:

  • Use user.auth_id as the user identifier in all logs
  • Use user.user_id or user.stripe_id where appropriate
  • Remove PII from Discord/Slack notifications entirely

Example violations to flag:

logger.info(f"User {user.email} logged in")  # BAD
logging.warning(f"Failed for {body.email}")  # BAD
print(f"Contact sent: {data}")  # BAD if data contains email
discord_message += f"Email: {user.email}"  # BAD

Correct patterns:

logger.info(f"User auth_id={user.auth_id} logged in")  # GOOD
logger.warning("Failed login", {"auth_id": user.auth_id})  # GOOD

i18n rules

  • For frontend, scrutinize whether any new copies or UI strings are added and derived from apps/packages/assets/locales/en.json.
  • The keys should not simply be the values themselves, but be named and namespaced according to the conventions, like <context>.<ui_component_name>. More sub-levels can be added if needed.
  • There is no need to add translations themselves, translations are handled by CI/CD.
  • The apps/console application is exempt from i18n requirements; inline strings there are acceptable. Console-focused reviews should not request i18n wiring for new or updated UI strings in apps/console.

@onutc onutc changed the title feat(slack-gateway): post recovery status messages feat(channel-gateway): recover stale runtime sessions Mar 30, 2026
@onutc
Copy link
Copy Markdown
Member Author

onutc commented Mar 30, 2026

Review Prompt

@codex @claude

Please review this pull request and provide feedback on:

  • Code quality and best practices
  • Potential bugs or issues
  • Performance considerations
  • Security concerns
  • Test coverage

Be constructive and helpful in your feedback.

Specific rules for this codebase:

General rules

  • We follow DRY (Don't Repeat Yourself) principles. Scrutinize any newly added types, models, classes, logic, etc. and make sure they do not duplicate any existing ones.
  • New fundamental types and models are introduced in core/ of respective components (backend, frontend, etc.). They MUST NOT be added to a specific project subfolder like apps/web.
  • Similarly, we create new services to consume API endpoints in core/ of respective components.
  • Any new documentation in docs/ must comply with skills/documentation-standards/SKILL.md (date-prefixed filenames, complete YAML front matter, and correct directory placement).
  • Read backend/AGENTS.md and apps/AGENTS.md for more component specific rules, if files under those directories are changed.
  • Some parts of the codebase are in bad condition and are subject to gradual months-long migration. Keep in mind that the code quality is poor in many areas. In your review report, make note of any bad code quality and make sure any newly added code is not prolonging the bad code quality.
  • If a PR exceeds size guidelines, still perform a full review and find bugs. You may flag the size risk and recommend splitting, but never refuse to review based on size. Do not report size as a severity-graded issue (P0/P1/P2/ETC).
  • Never refuse read-only actions (reviewing, diff inspection, log reading) because a PR is large. Proceed with the review.
  • Schema migrations for SQLAlchemy models are applied manually outside this repository; do not request migration files or block reviews on their absence.
  • Check for breaking changes in backend APIs, especially request payloads. Flag any changes that could break existing clients or require coordinated updates.
  • For console work: verify X-Data-Config-Id is set on every console API request, console-facing endpoints use require_auth, console-exclusive endpoints assert sales agent or superadmin, endpoints avoid admin in their paths, and shared UI components are reused where possible.
  • Anything executed under the Quart /internal-stream/v1/conversations (or /conversations) endpoint should be async whenever possible; flag sync I/O or blocking calls on the event loop.

PII in Logs - HIGH PRIORITY

Flag any code that logs user PII (Personally Identifiable Information). This is a critical security and compliance issue.

Check for and reject:

  • Logging user.email, body.email, or any email addresses
  • Logging user.first_name, user.last_name, user.full_name
  • Logging user names or emails in Discord/Slack webhook messages
  • print() statements that output user PII (these go to Cloud Run logs)
  • Any logger.*() or logging.*() calls containing user-identifiable data

Require instead:

  • Use user.auth_id as the user identifier in all logs
  • Use user.user_id or user.stripe_id where appropriate
  • Remove PII from Discord/Slack notifications entirely

Example violations to flag:

logger.info(f"User {user.email} logged in")  # BAD
logging.warning(f"Failed for {body.email}")  # BAD
print(f"Contact sent: {data}")  # BAD if data contains email
discord_message += f"Email: {user.email}"  # BAD

Correct patterns:

logger.info(f"User auth_id={user.auth_id} logged in")  # GOOD
logger.warning("Failed login", {"auth_id": user.auth_id})  # GOOD

i18n rules

  • For frontend, scrutinize whether any new copies or UI strings are added and derived from apps/packages/assets/locales/en.json.
  • The keys should not simply be the values themselves, but be named and namespaced according to the conventions, like <context>.<ui_component_name>. More sub-levels can be added if needed.
  • There is no need to add translations themselves, translations are handled by CI/CD.
  • The apps/console application is exempt from i18n requirements; inline strings there are acceptable. Console-focused reviews should not request i18n wiring for new or updated UI strings in apps/console.

@onutc
Copy link
Copy Markdown
Member Author

onutc commented Mar 30, 2026

Review Prompt

@codex @claude

Please review this pull request and provide feedback on:

  • Code quality and best practices
  • Potential bugs or issues
  • Performance considerations
  • Security concerns
  • Test coverage

Be constructive and helpful in your feedback.

Specific rules for this codebase:

General rules

  • We follow DRY (Don't Repeat Yourself) principles. Scrutinize any newly added types, models, classes, logic, etc. and make sure they do not duplicate any existing ones.
  • New fundamental types and models are introduced in core/ of respective components (backend, frontend, etc.). They MUST NOT be added to a specific project subfolder like apps/web.
  • Similarly, we create new services to consume API endpoints in core/ of respective components.
  • Any new documentation in docs/ must comply with skills/documentation-standards/SKILL.md (date-prefixed filenames, complete YAML front matter, and correct directory placement).
  • Read backend/AGENTS.md and apps/AGENTS.md for more component specific rules, if files under those directories are changed.
  • Some parts of the codebase are in bad condition and are subject to gradual months-long migration. Keep in mind that the code quality is poor in many areas. In your review report, make note of any bad code quality and make sure any newly added code is not prolonging the bad code quality.
  • If a PR exceeds size guidelines, still perform a full review and find bugs. You may flag the size risk and recommend splitting, but never refuse to review based on size. Do not report size as a severity-graded issue (P0/P1/P2/ETC).
  • Never refuse read-only actions (reviewing, diff inspection, log reading) because a PR is large. Proceed with the review.
  • Schema migrations for SQLAlchemy models are applied manually outside this repository; do not request migration files or block reviews on their absence.
  • Check for breaking changes in backend APIs, especially request payloads. Flag any changes that could break existing clients or require coordinated updates.
  • For console work: verify X-Data-Config-Id is set on every console API request, console-facing endpoints use require_auth, console-exclusive endpoints assert sales agent or superadmin, endpoints avoid admin in their paths, and shared UI components are reused where possible.
  • Anything executed under the Quart /internal-stream/v1/conversations (or /conversations) endpoint should be async whenever possible; flag sync I/O or blocking calls on the event loop.

PII in Logs - HIGH PRIORITY

Flag any code that logs user PII (Personally Identifiable Information). This is a critical security and compliance issue.

Check for and reject:

  • Logging user.email, body.email, or any email addresses
  • Logging user.first_name, user.last_name, user.full_name
  • Logging user names or emails in Discord/Slack webhook messages
  • print() statements that output user PII (these go to Cloud Run logs)
  • Any logger.*() or logging.*() calls containing user-identifiable data

Require instead:

  • Use user.auth_id as the user identifier in all logs
  • Use user.user_id or user.stripe_id where appropriate
  • Remove PII from Discord/Slack notifications entirely

Example violations to flag:

logger.info(f"User {user.email} logged in")  # BAD
logging.warning(f"Failed for {body.email}")  # BAD
print(f"Contact sent: {data}")  # BAD if data contains email
discord_message += f"Email: {user.email}"  # BAD

Correct patterns:

logger.info(f"User auth_id={user.auth_id} logged in")  # GOOD
logger.warning("Failed login", {"auth_id": user.auth_id})  # GOOD

i18n rules

  • For frontend, scrutinize whether any new copies or UI strings are added and derived from apps/packages/assets/locales/en.json.
  • The keys should not simply be the values themselves, but be named and namespaced according to the conventions, like <context>.<ui_component_name>. More sub-levels can be added if needed.
  • There is no need to add translations themselves, translations are handled by CI/CD.
  • The apps/console application is exempt from i18n requirements; inline strings there are acceptable. Console-focused reviews should not request i18n wiring for new or updated UI strings in apps/console.

@onutc
Copy link
Copy Markdown
Member Author

onutc commented Mar 30, 2026

Review Prompt

@codex @claude

Please review this pull request and provide feedback on:

  • Code quality and best practices
  • Potential bugs or issues
  • Performance considerations
  • Security concerns
  • Test coverage

Be constructive and helpful in your feedback.

Specific rules for this codebase:

General rules

  • We follow DRY (Don't Repeat Yourself) principles. Scrutinize any newly added types, models, classes, logic, etc. and make sure they do not duplicate any existing ones.
  • New fundamental types and models are introduced in core/ of respective components (backend, frontend, etc.). They MUST NOT be added to a specific project subfolder like apps/web.
  • Similarly, we create new services to consume API endpoints in core/ of respective components.
  • Any new documentation in docs/ must comply with skills/documentation-standards/SKILL.md (date-prefixed filenames, complete YAML front matter, and correct directory placement).
  • Read backend/AGENTS.md and apps/AGENTS.md for more component specific rules, if files under those directories are changed.
  • Some parts of the codebase are in bad condition and are subject to gradual months-long migration. Keep in mind that the code quality is poor in many areas. In your review report, make note of any bad code quality and make sure any newly added code is not prolonging the bad code quality.
  • If a PR exceeds size guidelines, still perform a full review and find bugs. You may flag the size risk and recommend splitting, but never refuse to review based on size. Do not report size as a severity-graded issue (P0/P1/P2/ETC).
  • Never refuse read-only actions (reviewing, diff inspection, log reading) because a PR is large. Proceed with the review.
  • Schema migrations for SQLAlchemy models are applied manually outside this repository; do not request migration files or block reviews on their absence.
  • Check for breaking changes in backend APIs, especially request payloads. Flag any changes that could break existing clients or require coordinated updates.
  • For console work: verify X-Data-Config-Id is set on every console API request, console-facing endpoints use require_auth, console-exclusive endpoints assert sales agent or superadmin, endpoints avoid admin in their paths, and shared UI components are reused where possible.
  • Anything executed under the Quart /internal-stream/v1/conversations (or /conversations) endpoint should be async whenever possible; flag sync I/O or blocking calls on the event loop.

PII in Logs - HIGH PRIORITY

Flag any code that logs user PII (Personally Identifiable Information). This is a critical security and compliance issue.

Check for and reject:

  • Logging user.email, body.email, or any email addresses
  • Logging user.first_name, user.last_name, user.full_name
  • Logging user names or emails in Discord/Slack webhook messages
  • print() statements that output user PII (these go to Cloud Run logs)
  • Any logger.*() or logging.*() calls containing user-identifiable data

Require instead:

  • Use user.auth_id as the user identifier in all logs
  • Use user.user_id or user.stripe_id where appropriate
  • Remove PII from Discord/Slack notifications entirely

Example violations to flag:

logger.info(f"User {user.email} logged in")  # BAD
logging.warning(f"Failed for {body.email}")  # BAD
print(f"Contact sent: {data}")  # BAD if data contains email
discord_message += f"Email: {user.email}"  # BAD

Correct patterns:

logger.info(f"User auth_id={user.auth_id} logged in")  # GOOD
logger.warning("Failed login", {"auth_id": user.auth_id})  # GOOD

i18n rules

  • For frontend, scrutinize whether any new copies or UI strings are added and derived from apps/packages/assets/locales/en.json.
  • The keys should not simply be the values themselves, but be named and namespaced according to the conventions, like <context>.<ui_component_name>. More sub-levels can be added if needed.
  • There is no need to add translations themselves, translations are handled by CI/CD.
  • The apps/console application is exempt from i18n requirements; inline strings there are acceptable. Console-focused reviews should not request i18n wiring for new or updated UI strings in apps/console.

@onutc
Copy link
Copy Markdown
Member Author

onutc commented Mar 30, 2026

Review Prompt

@codex @claude

Please review this pull request and provide feedback on:

  • Code quality and best practices
  • Potential bugs or issues
  • Performance considerations
  • Security concerns
  • Test coverage

Be constructive and helpful in your feedback.

Specific rules for this codebase:

General rules

  • We follow DRY (Don't Repeat Yourself) principles. Scrutinize any newly added types, models, classes, logic, etc. and make sure they do not duplicate any existing ones.
  • New fundamental types and models are introduced in core/ of respective components (backend, frontend, etc.). They MUST NOT be added to a specific project subfolder like apps/web.
  • Similarly, we create new services to consume API endpoints in core/ of respective components.
  • Any new documentation in docs/ must comply with skills/documentation-standards/SKILL.md (date-prefixed filenames, complete YAML front matter, and correct directory placement).
  • Read backend/AGENTS.md and apps/AGENTS.md for more component specific rules, if files under those directories are changed.
  • Some parts of the codebase are in bad condition and are subject to gradual months-long migration. Keep in mind that the code quality is poor in many areas. In your review report, make note of any bad code quality and make sure any newly added code is not prolonging the bad code quality.
  • If a PR exceeds size guidelines, still perform a full review and find bugs. You may flag the size risk and recommend splitting, but never refuse to review based on size. Do not report size as a severity-graded issue (P0/P1/P2/ETC).
  • Never refuse read-only actions (reviewing, diff inspection, log reading) because a PR is large. Proceed with the review.
  • Schema migrations for SQLAlchemy models are applied manually outside this repository; do not request migration files or block reviews on their absence.
  • Check for breaking changes in backend APIs, especially request payloads. Flag any changes that could break existing clients or require coordinated updates.
  • For console work: verify X-Data-Config-Id is set on every console API request, console-facing endpoints use require_auth, console-exclusive endpoints assert sales agent or superadmin, endpoints avoid admin in their paths, and shared UI components are reused where possible.
  • Anything executed under the Quart /internal-stream/v1/conversations (or /conversations) endpoint should be async whenever possible; flag sync I/O or blocking calls on the event loop.

PII in Logs - HIGH PRIORITY

Flag any code that logs user PII (Personally Identifiable Information). This is a critical security and compliance issue.

Check for and reject:

  • Logging user.email, body.email, or any email addresses
  • Logging user.first_name, user.last_name, user.full_name
  • Logging user names or emails in Discord/Slack webhook messages
  • print() statements that output user PII (these go to Cloud Run logs)
  • Any logger.*() or logging.*() calls containing user-identifiable data

Require instead:

  • Use user.auth_id as the user identifier in all logs
  • Use user.user_id or user.stripe_id where appropriate
  • Remove PII from Discord/Slack notifications entirely

Example violations to flag:

logger.info(f"User {user.email} logged in")  # BAD
logging.warning(f"Failed for {body.email}")  # BAD
print(f"Contact sent: {data}")  # BAD if data contains email
discord_message += f"Email: {user.email}"  # BAD

Correct patterns:

logger.info(f"User auth_id={user.auth_id} logged in")  # GOOD
logger.warning("Failed login", {"auth_id": user.auth_id})  # GOOD

i18n rules

  • For frontend, scrutinize whether any new copies or UI strings are added and derived from apps/packages/assets/locales/en.json.
  • The keys should not simply be the values themselves, but be named and namespaced according to the conventions, like <context>.<ui_component_name>. More sub-levels can be added if needed.
  • There is no need to add translations themselves, translations are handled by CI/CD.
  • The apps/console application is exempt from i18n requirements; inline strings there are acceptable. Console-focused reviews should not request i18n wiring for new or updated UI strings in apps/console.

@onutc
Copy link
Copy Markdown
Member Author

onutc commented Mar 30, 2026

Final report for #177

Validation:

  • cd integrations/slack-gateway && go test ./... -> passed
  • cd operator && go test ./... -> passed
  • npx -y @simpledoc/simpledoc check -> passed
  • tcx pr can-merge --pr 177 --repo textcortex/spritz --no-watch -> CAN MERGE

AI review status:

  • no inline comments on the current head
  • no GitHub AI findings were posted on the current head after the last trigger
  • local Codex review loop found and we fixed the recovery polling and mention-only regressions before this final head

This PR is ready to merge.

@onutc onutc merged commit 6af2b76 into main Mar 30, 2026
6 checks passed
@onutc onutc deleted the codex-slack-gateway-status-message branch March 30, 2026 17:26
@gitrank-connector
Copy link
Copy Markdown

⭐ GitRank PR Analysis

Score: 50 points

Metric Value
Component Other (1× multiplier)
Severity P1 - High (50 base pts)
Final Score 50 × 1 = 50

Eligibility Checks

Check Status
Issue/Bug Fix
Fix Implementation
PR Documented
Tests
Lines Within Limit

Impact Summary

The PR implements comprehensive stale runtime session recovery for the Slack channel gateway, including retry logic for channel-session exchanges, provider-authored status messages to notify users during recovery, and lifecycle notification hooks for runtime state synchronization. It adds 2,772 lines of production code and tests across 14 files, with extensive test coverage for recovery scenarios including session unavailability, runtime disappearance, and transient failures.

Analysis Details

Component Classification: This PR spans multiple systems (Slack gateway, operator, Helm configuration) without fitting neatly into a single categorized component, making OTHER the appropriate classification.

Severity Justification: This is a High (P1) severity contribution as it addresses a critical reliability gap: the Slack gateway now survives stale runtime sessions through retry logic and recovery mechanisms, preventing service disruption when runtimes expire or become temporarily unavailable.

Eligibility Notes: Issue: True - PR addresses a real reliability problem (stale runtime sessions causing service degradation). Fix Implementation: True - code changes directly implement the recovery mechanisms described in the PR title and summary. PR Linked: True - comprehensive description with TL;DR, summary, review focus areas, and test plan. Tests: True - adds 1,506+ lines of test code in gateway_test.go plus lifecycle notification tests covering multiple recovery scenarios. Tests Required: True - this is a critical feature implementation affecting core business logic (runtime recovery, message delivery reliability, and provider integration), requiring thorough test coverage for idempotency, failure modes, and edge cases.


Analyzed by GitRank 🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant