feat(channel-gateway): recover stale runtime sessions by onutc · Pull Request #177 · textcortex/spritz

onutc · 2026-03-30T15:50:29Z

TL;DR

The Slack gateway now survives stale runtime sessions by retrying channel-session exchange, posting one wake-up message if recovery is visible, and retrying the first prompt after spritz not found. It also adds a general lifecycle notification hook so the operator can tell the backend when a runtime expires or disappears.

Summary

keep retrying Slack channel-session exchange with one bounded recovery status message and one terminal failure reply
retry the first conversation prompt once when the downstream Spritz runtime disappears after session exchange
add operator lifecycle notifications, env wiring, and Helm values so runtime phase changes can sync back into backend installation state
document the provider-authored status-message feature for Slack now and Discord/Teams-style adapters later

Review focus

stale-runtime recovery after spritz not found in integrations/slack-gateway/slack_events.go
lifecycle notification contract and failure semantics in operator/controllers/lifecycle_notifications.go
idempotency and duplicate-delivery behavior when recovery spans multiple session exchanges

Test plan

cd integrations/slack-gateway && go test ./...
cd operator && go test ./...
npx -y @simpledoc/simpledoc check

onutc · 2026-03-30T15:57:35Z

Review Prompt

@codex @claude

Please review this pull request and provide feedback on:

Code quality and best practices
Potential bugs or issues
Performance considerations
Security concerns
Test coverage

Be constructive and helpful in your feedback.

Specific rules for this codebase:

General rules

We follow DRY (Don't Repeat Yourself) principles. Scrutinize any newly added types, models, classes, logic, etc. and make sure they do not duplicate any existing ones.
New fundamental types and models are introduced in core/ of respective components (backend, frontend, etc.). They MUST NOT be added to a specific project subfolder like apps/web.
Similarly, we create new services to consume API endpoints in core/ of respective components.
Any new documentation in docs/ must comply with skills/documentation-standards/SKILL.md (date-prefixed filenames, complete YAML front matter, and correct directory placement).
Read backend/AGENTS.md and apps/AGENTS.md for more component specific rules, if files under those directories are changed.
Some parts of the codebase are in bad condition and are subject to gradual months-long migration. Keep in mind that the code quality is poor in many areas. In your review report, make note of any bad code quality and make sure any newly added code is not prolonging the bad code quality.
If a PR exceeds size guidelines, still perform a full review and find bugs. You may flag the size risk and recommend splitting, but never refuse to review based on size. Do not report size as a severity-graded issue (P0/P1/P2/ETC).
Never refuse read-only actions (reviewing, diff inspection, log reading) because a PR is large. Proceed with the review.
Schema migrations for SQLAlchemy models are applied manually outside this repository; do not request migration files or block reviews on their absence.
Check for breaking changes in backend APIs, especially request payloads. Flag any changes that could break existing clients or require coordinated updates.
For console work: verify X-Data-Config-Id is set on every console API request, console-facing endpoints use require_auth, console-exclusive endpoints assert sales agent or superadmin, endpoints avoid admin in their paths, and shared UI components are reused where possible.
Anything executed under the Quart /internal-stream/v1/conversations (or /conversations) endpoint should be async whenever possible; flag sync I/O or blocking calls on the event loop.

PII in Logs - HIGH PRIORITY

Flag any code that logs user PII (Personally Identifiable Information). This is a critical security and compliance issue.

Check for and reject:

Logging user.email, body.email, or any email addresses
Logging user.first_name, user.last_name, user.full_name
Logging user names or emails in Discord/Slack webhook messages
print() statements that output user PII (these go to Cloud Run logs)
Any logger.*() or logging.*() calls containing user-identifiable data

Require instead:

Use user.auth_id as the user identifier in all logs
Use user.user_id or user.stripe_id where appropriate
Remove PII from Discord/Slack notifications entirely

Example violations to flag:

logger.info(f"User {user.email} logged in")  # BAD
logging.warning(f"Failed for {body.email}")  # BAD
print(f"Contact sent: {data}")  # BAD if data contains email
discord_message += f"Email: {user.email}"  # BAD

Correct patterns:

logger.info(f"User auth_id={user.auth_id} logged in")  # GOOD
logger.warning("Failed login", {"auth_id": user.auth_id})  # GOOD

i18n rules

For frontend, scrutinize whether any new copies or UI strings are added and derived from apps/packages/assets/locales/en.json.
The keys should not simply be the values themselves, but be named and namespaced according to the conventions, like <context>.<ui_component_name>. More sub-levels can be added if needed.
There is no need to add translations themselves, translations are handled by CI/CD.
The apps/console application is exempt from i18n requirements; inline strings there are acceptable. Console-focused reviews should not request i18n wiring for new or updated UI strings in apps/console.

onutc · 2026-03-30T16:24:35Z

Review Prompt

@codex @claude

Please review this pull request and provide feedback on:

Code quality and best practices
Potential bugs or issues
Performance considerations
Security concerns
Test coverage

Be constructive and helpful in your feedback.

Specific rules for this codebase:

General rules

We follow DRY (Don't Repeat Yourself) principles. Scrutinize any newly added types, models, classes, logic, etc. and make sure they do not duplicate any existing ones.
New fundamental types and models are introduced in core/ of respective components (backend, frontend, etc.). They MUST NOT be added to a specific project subfolder like apps/web.
Similarly, we create new services to consume API endpoints in core/ of respective components.
Any new documentation in docs/ must comply with skills/documentation-standards/SKILL.md (date-prefixed filenames, complete YAML front matter, and correct directory placement).
Read backend/AGENTS.md and apps/AGENTS.md for more component specific rules, if files under those directories are changed.
Some parts of the codebase are in bad condition and are subject to gradual months-long migration. Keep in mind that the code quality is poor in many areas. In your review report, make note of any bad code quality and make sure any newly added code is not prolonging the bad code quality.
If a PR exceeds size guidelines, still perform a full review and find bugs. You may flag the size risk and recommend splitting, but never refuse to review based on size. Do not report size as a severity-graded issue (P0/P1/P2/ETC).
Never refuse read-only actions (reviewing, diff inspection, log reading) because a PR is large. Proceed with the review.
Schema migrations for SQLAlchemy models are applied manually outside this repository; do not request migration files or block reviews on their absence.
Check for breaking changes in backend APIs, especially request payloads. Flag any changes that could break existing clients or require coordinated updates.
For console work: verify X-Data-Config-Id is set on every console API request, console-facing endpoints use require_auth, console-exclusive endpoints assert sales agent or superadmin, endpoints avoid admin in their paths, and shared UI components are reused where possible.
Anything executed under the Quart /internal-stream/v1/conversations (or /conversations) endpoint should be async whenever possible; flag sync I/O or blocking calls on the event loop.

PII in Logs - HIGH PRIORITY

Flag any code that logs user PII (Personally Identifiable Information). This is a critical security and compliance issue.

Check for and reject:

Logging user.email, body.email, or any email addresses
Logging user.first_name, user.last_name, user.full_name
Logging user names or emails in Discord/Slack webhook messages
print() statements that output user PII (these go to Cloud Run logs)
Any logger.*() or logging.*() calls containing user-identifiable data

Require instead:

Use user.auth_id as the user identifier in all logs
Use user.user_id or user.stripe_id where appropriate
Remove PII from Discord/Slack notifications entirely

Example violations to flag:

logger.info(f"User {user.email} logged in")  # BAD
logging.warning(f"Failed for {body.email}")  # BAD
print(f"Contact sent: {data}")  # BAD if data contains email
discord_message += f"Email: {user.email}"  # BAD

Correct patterns:

logger.info(f"User auth_id={user.auth_id} logged in")  # GOOD
logger.warning("Failed login", {"auth_id": user.auth_id})  # GOOD

i18n rules

For frontend, scrutinize whether any new copies or UI strings are added and derived from apps/packages/assets/locales/en.json.
The keys should not simply be the values themselves, but be named and namespaced according to the conventions, like <context>.<ui_component_name>. More sub-levels can be added if needed.
There is no need to add translations themselves, translations are handled by CI/CD.
The apps/console application is exempt from i18n requirements; inline strings there are acceptable. Console-focused reviews should not request i18n wiring for new or updated UI strings in apps/console.

onutc · 2026-03-30T16:39:48Z

Review Prompt

@codex @claude

Please review this pull request and provide feedback on:

Code quality and best practices
Potential bugs or issues
Performance considerations
Security concerns
Test coverage

Be constructive and helpful in your feedback.

Specific rules for this codebase:

General rules

We follow DRY (Don't Repeat Yourself) principles. Scrutinize any newly added types, models, classes, logic, etc. and make sure they do not duplicate any existing ones.
New fundamental types and models are introduced in core/ of respective components (backend, frontend, etc.). They MUST NOT be added to a specific project subfolder like apps/web.
Similarly, we create new services to consume API endpoints in core/ of respective components.
Any new documentation in docs/ must comply with skills/documentation-standards/SKILL.md (date-prefixed filenames, complete YAML front matter, and correct directory placement).
Read backend/AGENTS.md and apps/AGENTS.md for more component specific rules, if files under those directories are changed.
Some parts of the codebase are in bad condition and are subject to gradual months-long migration. Keep in mind that the code quality is poor in many areas. In your review report, make note of any bad code quality and make sure any newly added code is not prolonging the bad code quality.
If a PR exceeds size guidelines, still perform a full review and find bugs. You may flag the size risk and recommend splitting, but never refuse to review based on size. Do not report size as a severity-graded issue (P0/P1/P2/ETC).
Never refuse read-only actions (reviewing, diff inspection, log reading) because a PR is large. Proceed with the review.
Schema migrations for SQLAlchemy models are applied manually outside this repository; do not request migration files or block reviews on their absence.
Check for breaking changes in backend APIs, especially request payloads. Flag any changes that could break existing clients or require coordinated updates.
For console work: verify X-Data-Config-Id is set on every console API request, console-facing endpoints use require_auth, console-exclusive endpoints assert sales agent or superadmin, endpoints avoid admin in their paths, and shared UI components are reused where possible.
Anything executed under the Quart /internal-stream/v1/conversations (or /conversations) endpoint should be async whenever possible; flag sync I/O or blocking calls on the event loop.

PII in Logs - HIGH PRIORITY

Flag any code that logs user PII (Personally Identifiable Information). This is a critical security and compliance issue.

Check for and reject:

Logging user.email, body.email, or any email addresses
Logging user.first_name, user.last_name, user.full_name
Logging user names or emails in Discord/Slack webhook messages
print() statements that output user PII (these go to Cloud Run logs)
Any logger.*() or logging.*() calls containing user-identifiable data

Require instead:

Use user.auth_id as the user identifier in all logs
Use user.user_id or user.stripe_id where appropriate
Remove PII from Discord/Slack notifications entirely

Example violations to flag:

logger.info(f"User {user.email} logged in")  # BAD
logging.warning(f"Failed for {body.email}")  # BAD
print(f"Contact sent: {data}")  # BAD if data contains email
discord_message += f"Email: {user.email}"  # BAD

Correct patterns:

logger.info(f"User auth_id={user.auth_id} logged in")  # GOOD
logger.warning("Failed login", {"auth_id": user.auth_id})  # GOOD

i18n rules

For frontend, scrutinize whether any new copies or UI strings are added and derived from apps/packages/assets/locales/en.json.
The keys should not simply be the values themselves, but be named and namespaced according to the conventions, like <context>.<ui_component_name>. More sub-levels can be added if needed.
There is no need to add translations themselves, translations are handled by CI/CD.
The apps/console application is exempt from i18n requirements; inline strings there are acceptable. Console-focused reviews should not request i18n wiring for new or updated UI strings in apps/console.

onutc · 2026-03-30T17:00:47Z

Review Prompt

@codex @claude

Please review this pull request and provide feedback on:

Code quality and best practices
Potential bugs or issues
Performance considerations
Security concerns
Test coverage

Be constructive and helpful in your feedback.

Specific rules for this codebase:

General rules

We follow DRY (Don't Repeat Yourself) principles. Scrutinize any newly added types, models, classes, logic, etc. and make sure they do not duplicate any existing ones.
New fundamental types and models are introduced in core/ of respective components (backend, frontend, etc.). They MUST NOT be added to a specific project subfolder like apps/web.
Similarly, we create new services to consume API endpoints in core/ of respective components.
Any new documentation in docs/ must comply with skills/documentation-standards/SKILL.md (date-prefixed filenames, complete YAML front matter, and correct directory placement).
Read backend/AGENTS.md and apps/AGENTS.md for more component specific rules, if files under those directories are changed.
Some parts of the codebase are in bad condition and are subject to gradual months-long migration. Keep in mind that the code quality is poor in many areas. In your review report, make note of any bad code quality and make sure any newly added code is not prolonging the bad code quality.
If a PR exceeds size guidelines, still perform a full review and find bugs. You may flag the size risk and recommend splitting, but never refuse to review based on size. Do not report size as a severity-graded issue (P0/P1/P2/ETC).
Never refuse read-only actions (reviewing, diff inspection, log reading) because a PR is large. Proceed with the review.
Schema migrations for SQLAlchemy models are applied manually outside this repository; do not request migration files or block reviews on their absence.
Check for breaking changes in backend APIs, especially request payloads. Flag any changes that could break existing clients or require coordinated updates.
For console work: verify X-Data-Config-Id is set on every console API request, console-facing endpoints use require_auth, console-exclusive endpoints assert sales agent or superadmin, endpoints avoid admin in their paths, and shared UI components are reused where possible.
Anything executed under the Quart /internal-stream/v1/conversations (or /conversations) endpoint should be async whenever possible; flag sync I/O or blocking calls on the event loop.

PII in Logs - HIGH PRIORITY

Flag any code that logs user PII (Personally Identifiable Information). This is a critical security and compliance issue.

Check for and reject:

Logging user.email, body.email, or any email addresses
Logging user.first_name, user.last_name, user.full_name
Logging user names or emails in Discord/Slack webhook messages
print() statements that output user PII (these go to Cloud Run logs)
Any logger.*() or logging.*() calls containing user-identifiable data

Require instead:

Use user.auth_id as the user identifier in all logs
Use user.user_id or user.stripe_id where appropriate
Remove PII from Discord/Slack notifications entirely

Example violations to flag:

logger.info(f"User {user.email} logged in")  # BAD
logging.warning(f"Failed for {body.email}")  # BAD
print(f"Contact sent: {data}")  # BAD if data contains email
discord_message += f"Email: {user.email}"  # BAD

Correct patterns:

logger.info(f"User auth_id={user.auth_id} logged in")  # GOOD
logger.warning("Failed login", {"auth_id": user.auth_id})  # GOOD

i18n rules

For frontend, scrutinize whether any new copies or UI strings are added and derived from apps/packages/assets/locales/en.json.
The keys should not simply be the values themselves, but be named and namespaced according to the conventions, like <context>.<ui_component_name>. More sub-levels can be added if needed.
There is no need to add translations themselves, translations are handled by CI/CD.
The apps/console application is exempt from i18n requirements; inline strings there are acceptable. Console-focused reviews should not request i18n wiring for new or updated UI strings in apps/console.

onutc · 2026-03-30T17:09:54Z

Review Prompt

@codex @claude

Please review this pull request and provide feedback on:

Code quality and best practices
Potential bugs or issues
Performance considerations
Security concerns
Test coverage

Be constructive and helpful in your feedback.

Specific rules for this codebase:

General rules

We follow DRY (Don't Repeat Yourself) principles. Scrutinize any newly added types, models, classes, logic, etc. and make sure they do not duplicate any existing ones.
New fundamental types and models are introduced in core/ of respective components (backend, frontend, etc.). They MUST NOT be added to a specific project subfolder like apps/web.
Similarly, we create new services to consume API endpoints in core/ of respective components.
Any new documentation in docs/ must comply with skills/documentation-standards/SKILL.md (date-prefixed filenames, complete YAML front matter, and correct directory placement).
Read backend/AGENTS.md and apps/AGENTS.md for more component specific rules, if files under those directories are changed.
Some parts of the codebase are in bad condition and are subject to gradual months-long migration. Keep in mind that the code quality is poor in many areas. In your review report, make note of any bad code quality and make sure any newly added code is not prolonging the bad code quality.
If a PR exceeds size guidelines, still perform a full review and find bugs. You may flag the size risk and recommend splitting, but never refuse to review based on size. Do not report size as a severity-graded issue (P0/P1/P2/ETC).
Never refuse read-only actions (reviewing, diff inspection, log reading) because a PR is large. Proceed with the review.
Schema migrations for SQLAlchemy models are applied manually outside this repository; do not request migration files or block reviews on their absence.
Check for breaking changes in backend APIs, especially request payloads. Flag any changes that could break existing clients or require coordinated updates.
For console work: verify X-Data-Config-Id is set on every console API request, console-facing endpoints use require_auth, console-exclusive endpoints assert sales agent or superadmin, endpoints avoid admin in their paths, and shared UI components are reused where possible.
Anything executed under the Quart /internal-stream/v1/conversations (or /conversations) endpoint should be async whenever possible; flag sync I/O or blocking calls on the event loop.

PII in Logs - HIGH PRIORITY

Flag any code that logs user PII (Personally Identifiable Information). This is a critical security and compliance issue.

Check for and reject:

Logging user.email, body.email, or any email addresses
Logging user.first_name, user.last_name, user.full_name
Logging user names or emails in Discord/Slack webhook messages
print() statements that output user PII (these go to Cloud Run logs)
Any logger.*() or logging.*() calls containing user-identifiable data

Require instead:

Use user.auth_id as the user identifier in all logs
Use user.user_id or user.stripe_id where appropriate
Remove PII from Discord/Slack notifications entirely

Example violations to flag:

logger.info(f"User {user.email} logged in")  # BAD
logging.warning(f"Failed for {body.email}")  # BAD
print(f"Contact sent: {data}")  # BAD if data contains email
discord_message += f"Email: {user.email}"  # BAD

Correct patterns:

logger.info(f"User auth_id={user.auth_id} logged in")  # GOOD
logger.warning("Failed login", {"auth_id": user.auth_id})  # GOOD

i18n rules

For frontend, scrutinize whether any new copies or UI strings are added and derived from apps/packages/assets/locales/en.json.
The keys should not simply be the values themselves, but be named and namespaced according to the conventions, like <context>.<ui_component_name>. More sub-levels can be added if needed.
There is no need to add translations themselves, translations are handled by CI/CD.
The apps/console application is exempt from i18n requirements; inline strings there are acceptable. Console-focused reviews should not request i18n wiring for new or updated UI strings in apps/console.

onutc · 2026-03-30T17:26:29Z

Final report for #177

Validation:

cd integrations/slack-gateway && go test ./... -> passed
cd operator && go test ./... -> passed
npx -y @simpledoc/simpledoc check -> passed
tcx pr can-merge --pr 177 --repo textcortex/spritz --no-watch -> CAN MERGE

AI review status:

no inline comments on the current head
no GitHub AI findings were posted on the current head after the last trigger
local Codex review loop found and we fixed the recovery polling and mention-only regressions before this final head

This PR is ready to merge.

gitrank-connector · 2026-03-30T17:26:44Z

⭐ GitRank PR Analysis

Score: 50 points

Metric	Value
Component	Other (1× multiplier)
Severity	P1 - High (50 base pts)
Final Score	50 × 1 = 50

Eligibility Checks

Check	Status
Issue/Bug Fix	✅
Fix Implementation	✅
PR Documented	✅
Tests	✅
Lines Within Limit	✅

Impact Summary

The PR implements comprehensive stale runtime session recovery for the Slack channel gateway, including retry logic for channel-session exchanges, provider-authored status messages to notify users during recovery, and lifecycle notification hooks for runtime state synchronization. It adds 2,772 lines of production code and tests across 14 files, with extensive test coverage for recovery scenarios including session unavailability, runtime disappearance, and transient failures.

Analysis Details

Component Classification: This PR spans multiple systems (Slack gateway, operator, Helm configuration) without fitting neatly into a single categorized component, making OTHER the appropriate classification.

Severity Justification: This is a High (P1) severity contribution as it addresses a critical reliability gap: the Slack gateway now survives stale runtime sessions through retry logic and recovery mechanisms, preventing service disruption when runtimes expire or become temporarily unavailable.

Eligibility Notes: Issue: True - PR addresses a real reliability problem (stale runtime sessions causing service degradation). Fix Implementation: True - code changes directly implement the recovery mechanisms described in the PR title and summary. PR Linked: True - comprehensive description with TL;DR, summary, review focus areas, and test plan. Tests: True - adds 1,506+ lines of test code in gateway_test.go plus lifecycle notification tests covering multiple recovery scenarios. Tests Required: True - this is a critical feature implementation affecting core business logic (runtime recovery, message delivery reliability, and provider integration), requiring thorough test coverage for idempotency, failure modes, and edge cases.

Analyzed by GitRank 🤖

feat(slack-gateway): post recovery status messages

188fedf

refactor(slack-gateway): recover stale runtime sessions

814d344

onutc changed the title ~~feat(slack-gateway): post recovery status messages~~ feat(channel-gateway): recover stale runtime sessions Mar 30, 2026

fix(spritz): harden runtime recovery semantics

05a6299

fix(spritz): retry recovery status delivery

86f2199

fix(spritz): harden slack recovery polling

86c4b28

onutc merged commit 6af2b76 into main Mar 30, 2026
6 checks passed

onutc deleted the codex-slack-gateway-status-message branch March 30, 2026 17:26

Conversation

onutc commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Summary

Review focus

Test plan

Uh oh!

onutc commented Mar 30, 2026

General rules

PII in Logs - HIGH PRIORITY

i18n rules

Uh oh!

onutc commented Mar 30, 2026

General rules

PII in Logs - HIGH PRIORITY

i18n rules

Uh oh!

onutc commented Mar 30, 2026

General rules

PII in Logs - HIGH PRIORITY

i18n rules

Uh oh!

onutc commented Mar 30, 2026

General rules

PII in Logs - HIGH PRIORITY

i18n rules

Uh oh!

onutc commented Mar 30, 2026

General rules

PII in Logs - HIGH PRIORITY

i18n rules

Uh oh!

onutc commented Mar 30, 2026

Uh oh!

Uh oh!

gitrank-connector bot commented Mar 30, 2026

⭐ GitRank PR Analysis

Score: 50 points

Eligibility Checks

Impact Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

onutc commented Mar 30, 2026 •

edited

Loading