Skip to content

fix(observability): demote transient OpenAI embeddings 429s to expected and reduce Sentry noise#2294

Merged
M3gA-Mind merged 4 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/embeddings-429-sentry-noise
May 20, 2026
Merged

fix(observability): demote transient OpenAI embeddings 429s to expected and reduce Sentry noise#2294
M3gA-Mind merged 4 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/embeddings-429-sentry-noise

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented May 20, 2026

Summary

  • Updated OpenAI embeddings error reporting to route non-2xx failures through report_error_or_expected instead of always emitting hard error events.

  • Standardized embedding HTTP error text to canonical format: Embedding API error (): .

  • Ensured transient upstream statuses (especially 429 Too Many Requests) match observability classifiers and are demoted to warning breadcrumbs.

  • Added a focused regression test for 429 formatting/classification to prevent future Sentry-noise regressions.

Problem

  • OpenAI embeddings calls can return transient 429 rate-limit responses during normal load.

  • These errors are retried by memory-tree/background flows, but each attempt was still producing Sentry error events.

  • The previous message shape (Embedding API error : ...) did not consistently match the transient HTTP classifier expectations, increasing alert noise and masking actionable failures.

Solution

  • Switched embeddings non-2xx reporting to report_error_or_expected(...), allowing transient failures to be classified as expected noise.

  • Changed the emitted error string to canonical classifier-compatible shape: Embedding API error (): .

  • Kept failure tags (model, status, failure=non_2xx) for diagnostics while reducing noisy escalation.

  • Added embed_429_uses_canonical_transient_format test to assert:

    • message contains (429 Too Many Requests), and
    • is_transient_message_failure(...) returns true.

Submission Checklist

  • If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy

  • Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge.

  • Coverage matrix updated — added/removed/renamed feature rows in docs/TEST-COVERAGE-MATRIX.md reflect this change (or N/A: behaviour-only change)

  • All affected feature IDs from the matrix are listed in the PR description under ## Related (N/A if no feature IDs are affected)

  • No new external network dependencies introduced (mock backend used per Testing Strategy)

  • Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md) (N/A if not release-cut)

  • Linked issue closed via Closes #NNN in the ## Related section (N/A until issue is linked)

Impact

  • Runtime/platform impact: Rust core observability path only (desktop app behavior unchanged for users except reduced false-positive error noise).

  • Performance: negligible runtime overhead change; likely reduced observability/event volume under transient rate limiting.

  • Security/migration/compatibility: no migration or API compatibility changes; no new dependencies introduced.

Related

Summary by CodeRabbit

  • Bug Fixes

    • Standardized error message formatting for embedding API failures with clearer HTTP status code display
    • Enhanced error classification for transient upstream API failures
  • Tests

    • Added test coverage for rate-limiting error handling to validate transient error classification

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1aae62a9-ebb8-44a8-a223-20ec141d6051

📥 Commits

Reviewing files that changed from the base of the PR and between 401bfe6 and 6acfcd0.

📒 Files selected for processing (1)
  • src/openhuman/embeddings/openai_tests.rs

📝 Walkthrough

Walkthrough

Updates OpenAI embedding non-2xx error formatting to include parentheses around HTTP status and clarifies transient reporting via report_error_or_expected; adds a Tokio test asserting HTTP 429 embedding failures produce the canonical transient-format message and are classified as transient.

Changes

Embedding API transient error handling

Layer / File(s) Summary
Error message format and transient reporting docs
src/openhuman/embeddings/openai.rs
Error message format changed to Embedding API error ({status}): {text} and comments updated to indicate report_error_or_expected demotes transient upstream HTTP failures into warning/breadcrumb logging.
HTTP 429 transient error classification test
src/openhuman/embeddings/openai_tests.rs
New async test embed_429_uses_canonical_transient_format mocks a 429 OpenAI embeddings response and verifies the error string matches the canonical transient HTTP shape and is_transient_message_failure returns true.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • tinyhumansai/openhuman#2190: Also updates OpenAI embedding non-2xx handling to use the transient reporting path and adds tests asserting error message shapes (400 case).
  • tinyhumansai/openhuman#2216: Related edits routing OpenAI embedding errors through report_error_or_expected with tests for expected/transient classifications.

Suggested reviewers

  • senamakel
  • graycyrus

Poem

🐰 A parenthesis snug around a status line,
Four-twenty-nine tiptoes, labeled benign.
The test gives a nod with a breadcrumb cheer,
No alarms—just a warning, soft and clear.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: demotion of transient OpenAI 429 errors to reduce Sentry noise through use of report_error_or_expected, which matches the core changes in both the embeddings handler and the regression test.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@graycyrus graycyrus self-assigned this May 20, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, focused fix — the parenthesized format is clearly needed for the is_transient_upstream_http_message classifier (line 240 in observability.rs: lower.contains(&format!("api error ({code}"))) and the test is a welcome addition.

One thing worth tightening up on the test — see inline comment.

Comment thread src/openhuman/embeddings/openai_tests.rs
@YellowSnnowmann YellowSnnowmann marked this pull request as ready for review May 20, 2026 08:04
@YellowSnnowmann YellowSnnowmann requested a review from a team May 20, 2026 08:04
@coderabbitai coderabbitai Bot added the working A PR that is being worked on by the team. label May 20, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 20, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review — all prior changes addressed.

The new commit (6acfcd02) adds the api error (429 substring assertion I flagged in the prior review. The test now has three layers of defense:

  1. Canonical (429 Too Many Requests) shape check
  2. Classifier-arm substring guard (api error (429 )
  3. Broad is_transient_message_failure classifier pass

This is a clean, focused fix — format string tweak + solid regression test. No new issues found.

File Change
openai.rs Format {status}({status}) in error string + updated comment
openai_tests.rs New embed_429_uses_canonical_transient_format test with three assertions

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice work!

Copy link
Copy Markdown
Contributor

@M3gA-Mind M3gA-Mind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice work!

@M3gA-Mind M3gA-Mind merged commit f24dbc6 into tinyhumansai:main May 20, 2026
28 of 29 checks passed
mtkik pushed a commit to mtkik/openhuman-meet that referenced this pull request May 21, 2026
CodeGhost21 pushed a commit to CodeGhost21/openhuman that referenced this pull request May 22, 2026
AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants