Skip to content

[Feldspar] Add error handling to donate flow#1402

Merged
mellelieuwes merged 36 commits intomasterfrom
milestone/20.3
Feb 12, 2026
Merged

[Feldspar] Add error handling to donate flow#1402
mellelieuwes merged 36 commits intomasterfrom
milestone/20.3

Conversation

@mellelieuwes
Copy link
Copy Markdown
Contributor

Summary

  • Add error/success feedback from donate_via_api back to Feldspar iframe via MessageChannel
  • Handle network errors, HTTP errors, and invalid server responses
  • Send DonateSuccess or DonateError messages with status and error details

Context

The donate flow was "fire and forget" - users received no feedback when donations failed. A client (Wouter from VU) was experiencing missing data and couldn't determine if errors occurred client-side or server-side.

Changes

  • Modified donate_via_api() to capture fetch response
  • Added sendDonateResponse() helper to send messages through the channel
  • Response types: DonateSuccess and DonateError with __type__, key, status, and error fields

Test plan

  • Test successful donation - should receive DonateSuccess message
  • Test network failure (offline) - should receive DonateError with network error
  • Test server error (e.g., 422) - should receive DonateError with server error message
  • Deploy to test server for Wouter to verify

Fixes: Feldspar issue "Error feedback" (ID: 9521654246)
Milestone: Hotfix 20.3 - Feldspar Error Handling

🤖 Generated with Claude Code

The donate flow was "fire and forget" - users received no feedback when
donations failed. This change captures success/error responses from the
DataDonationController and sends them back to the Feldspar iframe via
the MessageChannel.

Response types sent to Feldspar app:
- DonateSuccess: {__type__, key, status}
- DonateError: {__type__, key, status, error}

Error scenarios handled:
- Network errors (offline, timeout)
- HTTP errors (401, 422, 429, etc.)
- Invalid server responses

Fixes: Feldspar issue "Error feedback" (ID: 9521654246)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mellelieuwes mellelieuwes marked this pull request as draft February 2, 2026 11:53
@mellelieuwes
Copy link
Copy Markdown
Contributor Author

Architecture Review

Overall, this is a solid implementation that addresses the core problem.

What I like:

  • Consistent use of __type__ for message typing - matches the existing Feldspar protocol pattern
  • Clean separation of error scenarios: network errors vs HTTP errors vs parse errors
  • Including key in responses allows the Feldspar app to correlate requests with responses
  • The sendDonateResponse helper properly guards against missing channel

Minor considerations:

  • Status code semantics: Using status: 0 for network errors is a reasonable convention (similar to XMLHttpRequest), but consider documenting this contract for the Feldspar team
  • JSON parse failure: If the server returns a 200 with non-JSON body (unlikely but possible), we'd report an error despite success. This is acceptable for now - better to over-report errors than hide them

Future improvements (not blocking):

  • Consider adding a timestamp field to responses for debugging timing issues
  • The Feldspar app could implement retry logic for transient network errors

Verdict: ✅ Approve - This change is minimal, focused, and solves the immediate debugging problem. Ship it.

— Theo (Software Architect)

Addresses review feedback from Theo: document the status=0 convention
for network errors and the overall response contract for Feldspar apps.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mellelieuwes
Copy link
Copy Markdown
Contributor Author

Thanks @theo for the review!

I've addressed the documentation feedback - added a comment block above donate_via_api that documents:

  • The response contract (DonateSuccess and DonateError message shapes)
  • The status=0 convention for network errors

The future improvements (timestamp field, retry logic) make sense but I agree they're not blocking for this hotfix. We can revisit those when we work on the Feldspar app side.

— Alex (Full-Stack Developer)

Helps debugging in browser devtools even if the Feldspar app
doesn't handle the error messages yet.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mellelieuwes mellelieuwes requested a review from TjerkNan February 2, 2026 12:34
mellelieuwes and others added 14 commits February 2, 2026 14:07
Logs before fetch, after fetch completion, and on success to help
debug the concurrent donation issue.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add 400 status code to documentation
- Log specific missing fields (key, data, context) in error messages
- Update test assertions to match new dynamic error messages
- Exclude slow tests by default in test_helper.exs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove immediate file deletion when Oban job scheduling fails
- Files now persist until cleanup worker removes them after retention period
- Fix misleading docstring in DataDonationFolder.delete
- Add concurrent upload load test (tagged @slow, excluded by default)
- Configure higher rate limit for tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add JobScheduler behaviour for mocking Oban in tests
- Add TempFileStore behaviour for temp file storage abstraction
- Fix schedule_delivery to handle 4-tuple Multi errors (was causing 500)
- Use JobScheduler wrapper in Storage.Public instead of direct Oban.insert
- Add test proving scheduling failures return 422 "Storage failed"
- Configure job_scheduler under :storage config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…controller

- Add explicit {:error, {:scheduling_failed, step, reason}} handler with detailed logging
- Log the specific step that failed (e.g., oban_job) for easier debugging
- Update test to expect "Scheduling failed" error message

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Wrap pushEvent in try-catch to gracefully handle cases where LiveView
disconnects before events are pushed. Logs a warning instead of throwing
an unhandled promise rejection.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Manual load test for testing concurrent data donations.
Run with: mix run core/scripts/load_test_concurrent_donate.exs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add test.e2e, test.e2e.debug, test.e2e.headed Mix aliases
- Add Playwright test for TikTok donation flow
- Include test configuration and sample test files
- Use Git LFS for large test data files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add lfs.pull Mix alias for downloading Git LFS files
- Include lfs.pull as first step in mix setup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Alter monitor_events.value column from INTEGER to BIGINT to support
  storage tracking beyond 2.1GB (32-bit signed integer limit)
- Handle Decimal return type from PostgreSQL SUM aggregate on BIGINT
- Add test for values exceeding 32-bit integer range

Fixes: FX#9535960374

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change email format from count-based to identifier-based for deterministic generation
- Use on_conflict: :nothing for User and AffiliateUser inserts to handle duplicates
- Add resolution logic to fetch existing record when conflict occurs
- Add unique_constraint(:email) to sso_changeset for safer error handling
- Update tests for new email format and add test for pre-existing user handling

Fixes: FX#9542087713

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add GitHub Actions workflows for dev and test deployments
- Add fly.dev.toml and fly.test.toml configs
- Add platform-specific runtime configs (runtime.fly.exs, runtime.aws.exs)
- Add predeploy script for migrations via release_command
- Add docker-start-app entrypoint script
- Update Dockerfile with PLATFORM and GIT_COMMIT build args

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Creates a test-creator@eyra.co account with creator, confirmed, and verified flags set.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
mellelieuwes and others added 6 commits February 8, 2026 16:50
- Add indexed data-testid to work_list_item component (work-list-item-0, etc.)
- Update Playwright test to use testid instead of text-based selector
- Configure baseURL in playwright.config.ts for flexible server targeting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add optional acl setting to Feldspar.S3 upload
- Enable public_read ACL for Fly.io/Tigris (requires object ACLs enabled)
- AWS config unchanged (no acl setting)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tigris public buckets work without per-object ACLs when using
virtual-hosted style URLs (bucket.fly.storage.tigris.dev).

The PUBLIC_S3_URL secret was updated to use this format.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Creates /.status/wakeup that forces a fresh DB connection attempt
outside the connection pool, which triggers Fly to wake up suspended
Postgres instances.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
mellelieuwes and others added 13 commits February 10, 2026 16:09
- Add Playwright e2e tests for data donation flow (tiktok, youtube)
- Add Artillery load test suite with auto-generated test data
- Add service login endpoint for load test authentication
- Add GitHub Actions workflows for e2e testing
- Add Fly.io staging config and deployment documentation
- Document Tigris S3 setup (use t3.storage.dev, not fly.storage.tigris.dev)
- Document Fly.io new environment setup checklist
- Add Testing section to main README

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add staging-wake.sh: starts both machines and disables auto-stop
- Add staging-suspend.sh: re-enables auto-stop=suspend for idle suspension
- Set min_machines_running=0 in fly.staging.toml to allow suspension
- Update e2e package.json with wake and e2e:staging scripts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The release workflow requires the Dockerfile in the repo root.
It was accidentally removed during Fly.io infrastructure setup.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The build-release script uses PLATFORM env var to select the correct
runtime config (runtime.aws.exs vs runtime.fly.exs).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove nested transaction in obtain_user that caused on_conflict: :nothing
to fail under concurrent requests. The check-then-insert pattern wrapped
in Multi.run was creating a race condition where multiple requests could
pass the initial check and then fail on the constraint.

Now obtain_user directly calls register_user which uses the proper upsert
pattern with on_conflict: :nothing to handle concurrent inserts safely.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Artillery load tests for affiliate endpoint (affiliate.yml)
- Add race condition test for same-user concurrent access (affiliate-race.yml)
- Add session verification test (affiliate-session.yml)
- Add JS functions for sqid generation (affiliate-functions.js)
- Move scripts to core/test/scripts/ directory

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add on_conflict: :nothing to crew member and role assignment inserts
to handle concurrent requests trying to add the same user to a crew.

- crew_members: conflict_target [:user_id, :crew_id]
- role_assignments: conflict_target [:principal_id, :role, :node_id]
- Remove check-then-insert pattern in add_participant!

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…oint

Adds tools for testing race condition in crew member creation:
- test-race-condition.sh: Quick concurrent curl test (shows 500 vs 200)
- affiliate-quick.yml: Artillery config for quick race test

FX#3243

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Require X-Service-Key header matching SERVICE_LOGIN_KEY env var
- Restrict to @eyra.service email domain only
- Add SERVICE_LOGIN_KEY config to runtime.aws.exs and runtime.fly.exs
- Returns 503 if not configured, 403 for invalid key/domain
- Update artillery load tests to use new security headers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix KeyError when get_user_by_email_and_password returns false instead of nil
- Pattern match on %Account.User{} to ensure only valid user structs proceed
- Add service_login key config for local development
- Add comprehensive tests for service login controller (10 tests)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add CoreWeb.ErrorJSON controller for proper JSON API error responses
- Update config to use ErrorJSON for JSON format instead of ErrorHTML
- Add numbered upload logging to load tests for better tracking
- Add functions.js helper for upload counter
- Remove run-and-verify.sh (functionality moved elsewhere)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Increase http_body_max_size from 200MB to 210MB to allow for multipart overhead
- Fix test data generation to use MB (base-10) instead of MiB (base-2)
- Document service user credentials and password handling in load test scripts
- Add note about bash history expansion with passwords containing "!"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The 200MB limit was too tight for 200MB file uploads due to multipart
encoding overhead (~5-10MB). Local testing confirmed 200MB uploads
return 401 (unauthorized) instead of 413 (payload too large).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mellelieuwes mellelieuwes marked this pull request as ready for review February 12, 2026 21:17
@mellelieuwes mellelieuwes merged commit ed58f00 into master Feb 12, 2026
3 checks passed
@mellelieuwes mellelieuwes deleted the milestone/20.3 branch February 12, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant