[Feldspar] Add error handling to donate flow by mellelieuwes · Pull Request #1402 · eyra/mono

mellelieuwes · 2026-02-02T11:48:18Z

Summary

Add error/success feedback from donate_via_api back to Feldspar iframe via MessageChannel
Handle network errors, HTTP errors, and invalid server responses
Send DonateSuccess or DonateError messages with status and error details

Context

The donate flow was "fire and forget" - users received no feedback when donations failed. A client (Wouter from VU) was experiencing missing data and couldn't determine if errors occurred client-side or server-side.

Changes

Modified donate_via_api() to capture fetch response
Added sendDonateResponse() helper to send messages through the channel
Response types: DonateSuccess and DonateError with __type__, key, status, and error fields

Test plan

Test successful donation - should receive DonateSuccess message
Test network failure (offline) - should receive DonateError with network error
Test server error (e.g., 422) - should receive DonateError with server error message
Deploy to test server for Wouter to verify

Fixes: Feldspar issue "Error feedback" (ID: 9521654246)
Milestone: Hotfix 20.3 - Feldspar Error Handling

🤖 Generated with Claude Code

The donate flow was "fire and forget" - users received no feedback when donations failed. This change captures success/error responses from the DataDonationController and sends them back to the Feldspar iframe via the MessageChannel. Response types sent to Feldspar app: - DonateSuccess: {__type__, key, status} - DonateError: {__type__, key, status, error} Error scenarios handled: - Network errors (offline, timeout) - HTTP errors (401, 422, 429, etc.) - Invalid server responses Fixes: Feldspar issue "Error feedback" (ID: 9521654246) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mellelieuwes · 2026-02-02T11:57:54Z

Architecture Review

Overall, this is a solid implementation that addresses the core problem.

What I like:

Consistent use of __type__ for message typing - matches the existing Feldspar protocol pattern
Clean separation of error scenarios: network errors vs HTTP errors vs parse errors
Including key in responses allows the Feldspar app to correlate requests with responses
The sendDonateResponse helper properly guards against missing channel

Minor considerations:

Status code semantics: Using status: 0 for network errors is a reasonable convention (similar to XMLHttpRequest), but consider documenting this contract for the Feldspar team
JSON parse failure: If the server returns a 200 with non-JSON body (unlikely but possible), we'd report an error despite success. This is acceptable for now - better to over-report errors than hide them

Future improvements (not blocking):

Consider adding a timestamp field to responses for debugging timing issues
The Feldspar app could implement retry logic for transient network errors

Verdict: ✅ Approve - This change is minimal, focused, and solves the immediate debugging problem. Ship it.

— Theo (Software Architect)

Addresses review feedback from Theo: document the status=0 convention for network errors and the overall response contract for Feldspar apps. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mellelieuwes · 2026-02-02T11:59:04Z

Thanks @theo for the review!

I've addressed the documentation feedback - added a comment block above donate_via_api that documents:

The response contract (DonateSuccess and DonateError message shapes)
The status=0 convention for network errors

The future improvements (timestamp field, retry logic) make sense but I agree they're not blocking for this hotfix. We can revisit those when we work on the Feldspar app side.

— Alex (Full-Stack Developer)

Helps debugging in browser devtools even if the Feldspar app doesn't handle the error messages yet. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Logs before fetch, after fetch completion, and on success to help debug the concurrent donation issue. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add 400 status code to documentation - Log specific missing fields (key, data, context) in error messages - Update test assertions to match new dynamic error messages - Exclude slow tests by default in test_helper.exs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove immediate file deletion when Oban job scheduling fails - Files now persist until cleanup worker removes them after retention period - Fix misleading docstring in DataDonationFolder.delete - Add concurrent upload load test (tagged @slow, excluded by default) - Configure higher rate limit for tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add JobScheduler behaviour for mocking Oban in tests - Add TempFileStore behaviour for temp file storage abstraction - Fix schedule_delivery to handle 4-tuple Multi errors (was causing 500) - Use JobScheduler wrapper in Storage.Public instead of direct Oban.insert - Add test proving scheduling failures return 422 "Storage failed" - Configure job_scheduler under :storage config Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…controller - Add explicit {:error, {:scheduling_failed, step, reason}} handler with detailed logging - Log the specific step that failed (e.g., oban_job) for easier debugging - Update test to expect "Scheduling failed" error message Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Wrap pushEvent in try-catch to gracefully handle cases where LiveView disconnects before events are pushed. Logs a warning instead of throwing an unhandled promise rejection. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Manual load test for testing concurrent data donations. Run with: mix run core/scripts/load_test_concurrent_donate.exs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add test.e2e, test.e2e.debug, test.e2e.headed Mix aliases - Add Playwright test for TikTok donation flow - Include test configuration and sample test files - Use Git LFS for large test data files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add lfs.pull Mix alias for downloading Git LFS files - Include lfs.pull as first step in mix setup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Alter monitor_events.value column from INTEGER to BIGINT to support storage tracking beyond 2.1GB (32-bit signed integer limit) - Handle Decimal return type from PostgreSQL SUM aggregate on BIGINT - Add test for values exceeding 32-bit integer range Fixes: FX#9535960374 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Change email format from count-based to identifier-based for deterministic generation - Use on_conflict: :nothing for User and AffiliateUser inserts to handle duplicates - Add resolution logic to fetch existing record when conflict occurs - Add unique_constraint(:email) to sso_changeset for safer error handling - Update tests for new email format and add test for pre-existing user handling Fixes: FX#9542087713 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add GitHub Actions workflows for dev and test deployments - Add fly.dev.toml and fly.test.toml configs - Add platform-specific runtime configs (runtime.fly.exs, runtime.aws.exs) - Add predeploy script for migrations via release_command - Add docker-start-app entrypoint script - Update Dockerfile with PLATFORM and GIT_COMMIT build args Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Creates a test-creator@eyra.co account with creator, confirmed, and verified flags set. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add indexed data-testid to work_list_item component (work-list-item-0, etc.) - Update Playwright test to use testid instead of text-based selector - Configure baseURL in playwright.config.ts for flexible server targeting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add optional acl setting to Feldspar.S3 upload - Enable public_read ACL for Fly.io/Tigris (requires object ACLs enabled) - AWS config unchanged (no acl setting) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Tigris public buckets work without per-object ACLs when using virtual-hosted style URLs (bucket.fly.storage.tigris.dev). The PUBLIC_S3_URL secret was updated to use this format. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Creates /.status/wakeup that forces a fresh DB connection attempt outside the connection pool, which triggers Fly to wake up suspended Postgres instances. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add Playwright e2e tests for data donation flow (tiktok, youtube) - Add Artillery load test suite with auto-generated test data - Add service login endpoint for load test authentication - Add GitHub Actions workflows for e2e testing - Add Fly.io staging config and deployment documentation - Document Tigris S3 setup (use t3.storage.dev, not fly.storage.tigris.dev) - Document Fly.io new environment setup checklist - Add Testing section to main README Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add staging-wake.sh: starts both machines and disables auto-stop - Add staging-suspend.sh: re-enables auto-stop=suspend for idle suspension - Set min_machines_running=0 in fly.staging.toml to allow suspension - Update e2e package.json with wake and e2e:staging scripts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The release workflow requires the Dockerfile in the repo root. It was accidentally removed during Fly.io infrastructure setup. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The build-release script uses PLATFORM env var to select the correct runtime config (runtime.aws.exs vs runtime.fly.exs). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove nested transaction in obtain_user that caused on_conflict: :nothing to fail under concurrent requests. The check-then-insert pattern wrapped in Multi.run was creating a race condition where multiple requests could pass the initial check and then fail on the constraint. Now obtain_user directly calls register_user which uses the proper upsert pattern with on_conflict: :nothing to handle concurrent inserts safely. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add Artillery load tests for affiliate endpoint (affiliate.yml) - Add race condition test for same-user concurrent access (affiliate-race.yml) - Add session verification test (affiliate-session.yml) - Add JS functions for sqid generation (affiliate-functions.js) - Move scripts to core/test/scripts/ directory Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add on_conflict: :nothing to crew member and role assignment inserts to handle concurrent requests trying to add the same user to a crew. - crew_members: conflict_target [:user_id, :crew_id] - role_assignments: conflict_target [:principal_id, :role, :node_id] - Remove check-then-insert pattern in add_participant! Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…oint Adds tools for testing race condition in crew member creation: - test-race-condition.sh: Quick concurrent curl test (shows 500 vs 200) - affiliate-quick.yml: Artillery config for quick race test FX#3243 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Require X-Service-Key header matching SERVICE_LOGIN_KEY env var - Restrict to @eyra.service email domain only - Add SERVICE_LOGIN_KEY config to runtime.aws.exs and runtime.fly.exs - Returns 503 if not configured, 403 for invalid key/domain - Update artillery load tests to use new security headers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix KeyError when get_user_by_email_and_password returns false instead of nil - Pattern match on %Account.User{} to ensure only valid user structs proceed - Add service_login key config for local development - Add comprehensive tests for service login controller (10 tests) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add CoreWeb.ErrorJSON controller for proper JSON API error responses - Update config to use ErrorJSON for JSON format instead of ErrorHTML - Add numbered upload logging to load tests for better tracking - Add functions.js helper for upload counter - Remove run-and-verify.sh (functionality moved elsewhere) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Increase http_body_max_size from 200MB to 210MB to allow for multipart overhead - Fix test data generation to use MB (base-10) instead of MiB (base-2) - Document service user credentials and password handling in load test scripts - Add note about bash history expansion with passwords containing "!" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The 200MB limit was too tight for 200MB file uploads due to multipart encoding overhead (~5-10MB). Local testing confirmed 200MB uploads return 401 (unauthorized) instead of 413 (payload too large). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mellelieuwes requested review from AdrienneMendrik and tschmits February 2, 2026 11:53

mellelieuwes marked this pull request as draft February 2, 2026 11:53

Add documentation for donate response contract

9026aa8

Addresses review feedback from Theo: document the status=0 convention for network errors and the overall response contract for Feldspar apps. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add console.error logging for donate errors

869d221

Helps debugging in browser devtools even if the Feldspar app doesn't handle the error messages yet. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mellelieuwes requested a review from TjerkNan February 2, 2026 12:34

mellelieuwes and others added 14 commits February 2, 2026 14:07

Add debug logging for successful donations

0f73217

Logs before fetch, after fetch completion, and on success to help debug the concurrent donation issue. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Handle LiveView disconnection in Feldspar event push

484f1cd

Wrap pushEvent in try-catch to gracefully handle cases where LiveView disconnects before events are pushed. Logs a warning instead of throwing an unhandled promise rejection. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add concurrent donation load test script

acf79ab

Manual load test for testing concurrent data donations. Run with: mix run core/scripts/load_test_concurrent_donate.exs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Formatted code

c92ca76

Add lfs.pull alias and include in setup task

eca0227

- Add lfs.pull Mix alias for downloading Git LFS files - Include lfs.pull as first step in mix setup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add seed_test_user release command for Playwright tests

7ec3e37

Creates a test-creator@eyra.co account with creator, confirmed, and verified flags set. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mellelieuwes force-pushed the milestone/20.3 branch from 3af7be8 to 7ec3e37 Compare February 8, 2026 16:49

mellelieuwes and others added 6 commits February 8, 2026 16:50

Formatted code

dacffec

Add configurable ACL for Feldspar S3 uploads

d779e80

- Add optional acl setting to Feldspar.S3 upload - Enable public_read ACL for Fly.io/Tigris (requires object ACLs enabled) - AWS config unchanged (no acl setting) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add wakeup endpoint for Fly Postgres auto-suspend

ec2f599

Creates /.status/wakeup that forces a fresh DB connection attempt outside the connection pool, which triggers Fly to wake up suspended Postgres instances. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add VERSION build arg to fly.test.toml

dfba56d

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mellelieuwes and others added 13 commits February 10, 2026 16:09

Restore root Dockerfile for release workflow

27d6923

The release workflow requires the Dockerfile in the repo root. It was accidentally removed during Fly.io infrastructure setup. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add PLATFORM=aws to release workflow

dce79b3

The build-release script uses PLATFORM env var to select the correct runtime config (runtime.aws.exs vs runtime.fly.exs). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mellelieuwes marked this pull request as ready for review February 12, 2026 21:17

mellelieuwes merged commit ed58f00 into master Feb 12, 2026
3 checks passed

mellelieuwes deleted the milestone/20.3 branch February 12, 2026 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feldspar] Add error handling to donate flow#1402

[Feldspar] Add error handling to donate flow#1402
mellelieuwes merged 36 commits intomasterfrom
milestone/20.3

mellelieuwes commented Feb 2, 2026

Uh oh!

mellelieuwes commented Feb 2, 2026

Uh oh!

mellelieuwes commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mellelieuwes commented Feb 2, 2026

Summary

Context

Changes

Test plan

Uh oh!

mellelieuwes commented Feb 2, 2026

Architecture Review

What I like:

Minor considerations:

Future improvements (not blocking):

Uh oh!

mellelieuwes commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant