Skip to content

docs: new architecture RFC#115

Open
alukach wants to merge 16 commits intomainfrom
docs/new-architecture-rfc
Open

docs: new architecture RFC#115
alukach wants to merge 16 commits intomainfrom
docs/new-architecture-rfc

Conversation

@alukach
Copy link
Copy Markdown
Contributor

@alukach alukach commented Mar 16, 2026

What I'm changing

This PR adds a new RFP and accompanying ADRs to the codebase.

How I did it

This RFC (adrs/rfc-001.md) describes a new architecture for our data proxy; some of which has been partially actualized in #109. Accompanying the RFC is various ADRs that go into greater detail about components of the architecture. The new architecture is future facing (ie it includes designs for new features) but attempts to be constrained, only thoroughly exploring a first wave of near-term features and only acknowledging later-stage features like metering and rate-limiting.

How to test it

Review, comment inline on PR.

@alukach alukach changed the title [IN PROGRESS] docs: new architecture RFC docs: new architecture RFC Mar 23, 2026
@alukach alukach marked this pull request as ready for review March 23, 2026 22:39
alukach added a commit that referenced this pull request Mar 24, 2026
…116)

## What I'm changing

Pushed by a recent spike of high ALB egress bills, this PR swaps out the
data proxy for a data proxy written with
[multistore](https://github.com/developmentseed/multistore). This allows
us to deploy the data proxy onto Cloudflare Workers, thereby pushing all
egress charges directly to S3 and in line with the AWS Open Data
Program.

This is a read-only proxy, write operations will be added at a later
date.

## How I did it

Deployed to Cloudflare Workers. We're currently serving ~4M requests per
day and are seeing an error rate of ~0.001%

### Custom URLs

Obtaining custom URLs without migrating all of the source.coop DNS
settings over the Cloudflare was a bit of a challenge. I opted to host
the proxy workers on `coolnewgeo.com` (an unused domain owned by Radiant
Earth):

* `data.coolnewgeo.com` - prod
* `staging.data.coolnewgeo.com` - staging

Custom Hostnames have been set up under `coolnewgeo.com` for
`data.source.coop` and `data.staging.source.coop`, both pointing to a
null fallback origin of `fallback.coolnewgeo.com` (which has a DNS A
record pointing to `192.0.2.1`).

Configured routes on the workers connect these custom hostnames to each
worker environment:


https://github.com/source-cooperative/data.source.coop/blob/0a44d6bd70d6f132f0c519f0d9367d82ed79a2dd/wrangler.toml#L5-L11


https://github.com/source-cooperative/data.source.coop/blob/0a44d6bd70d6f132f0c519f0d9367d82ed79a2dd/wrangler.toml#L36-L40

## How to test it

This has been running in production for the past week and is used by
https://source.coop.

## PR Checklist

- [ ] ~This PR has **no** breaking changes.~
- [ ] I have updated or added new tests to cover the changes in this PR.
- [ ] This PR affects the [Source Cooperative Frontend &
API](https://github.com/source-cooperative/source.coop),
      and I have opened issue/PR #XXX to track the change.

## Related Issues

* #115
* #1 

## TODO

- [x] Setup autodeploy for staging on merges to `main`
- [x] Setup autodeploy for production on releases
- [x] Setup autodeploy for PRs

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
alukach and others added 5 commits March 23, 2026 19:53
Covers federated identity via OIDC, user-defined Roles and IdPs,
claim constraint language, permission model, credential issuance,
request-time authorization, and client tooling integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… token exchange

ADR-001: Replace embedded SecretAccessKey with HMAC derivation, add ES256
signing, revocation via jti deny-list, and updated SessionToken JWT structure.

ADR-004: Rewrite for two-tier IdP model (platform + account-registered),
user-defined Roles with claim constraints and permission statements,
AWS STS-compatible request/response format, and removal of SC Credential
Tokens.

ADR-005: Replace fixed 3-role model with user-defined Roles as permission
ceiling. Resolve grant schema with concrete permission statement format
(read/write actions, URN resource patterns with prefix scoping). Update
authorization flow to use Role ceiling from SessionToken intersected with
dynamic account permissions.

RFC-001: Update sections 4, 7, 8, 13, and 14 to reflect new design. Mark
open question 7 (grant schema) as resolved. Add new open questions for
org permission model, HMAC secret rotation, and multipart upload credential
expiry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alukach alukach force-pushed the docs/new-architecture-rfc branch from 600e47a to 2fd17f2 Compare March 24, 2026 02:53
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 24, 2026

🚀 Latest commit deployed to https://source-data-proxy-pr-115.source-coop.workers.dev

  • Date: 2026-04-07T20:19:45Z
  • Commit: dbbe470


The Workers deployment hosts an STS endpoint at `/.sts` for credential exchange.

```mermaid
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to visualize both the current (non-Cloudflare) data flow vs. the new Cloudflare data flow.

**Costs / Risks**

- WASM compilation constrains library choices (no `std` features that don't work in WASM)
- In-region, high-throughput workflows (e.g. bulk ETL in `us-west-2`) route through the edge rather than staying within the region — this adds latency and may incur upstream egress fees that an in-region proxy would avoid
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How significant is the added latency?

Comment on lines +215 to +223
1. Parse `RoleArn` → extract `account_id` and `role_name`
2. Load Role definition from policy store (cached, 30–60s TTL)
3. Extract `iss` from JWT (without verification)
4. Match `iss` against the Role's allowed IdPs — reject immediately if no match
5. Fetch JWKS from the matched IdP (cached, 1hr TTL, 3s timeout, stale-while-revalidate on fetch failure)
6. Verify JWT signature, `exp`, `nbf` (60s clock skew tolerance), and `aud`
7. Evaluate claim constraints for the matched IdP binding
8. Validate `DurationSeconds` ≤ Role's `max_session_duration`
9. Generate credentials (see ADR-001 for token structure) and return response
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Parse `RoleArn` → extract `account_id` and `role_name`
2. Load Role definition from policy store (cached, 30–60s TTL)
3. Extract `iss` from JWT (without verification)
4. Match `iss` against the Role's allowed IdPs — reject immediately if no match
5. Fetch JWKS from the matched IdP (cached, 1hr TTL, 3s timeout, stale-while-revalidate on fetch failure)
6. Verify JWT signature, `exp`, `nbf` (60s clock skew tolerance), and `aud`
7. Evaluate claim constraints for the matched IdP binding
8. Validate `DurationSeconds` ≤ Role's `max_session_duration`
9. Generate credentials (see ADR-001 for token structure) and return response
1. Parse `RoleArn` → extract `account_id` and `role_name`
2. Load Role definition from policy store (cached, 30–60s TTL)
3. Extract `iss` from JWT (without verification)
4. Match `iss` against the Role's allowed IdPs — reject immediately if no match
5. Fetch JWKS from the matched IdP (cached, 1hr TTL, 3s timeout, stale-while-revalidate on fetch failure)
6. Verify JWT signature, `exp`, `nbf` (60s clock skew tolerance), and `aud`
7. Evaluate claim constraints for the matched IdP binding
8. Validate `DurationSeconds` ≤ Role's `max_session_duration`
9. Generate credentials (see ADR-001 for token structure) and return response
````mermaid
flowchart TD
Start["Receive AssumeRoleWithWebIdentity request"] --> Parse["1. Parse RoleArn<br/>→ account_id + role_name"]
Parse --> LoadRole["2. Load Role from policy store<br/>(cached, 30–60s TTL)"]
LoadRole --> ExtractIss["3. Extract iss from JWT<br/>(without verification)"]
ExtractIss --> MatchIdP{"4. Does iss match<br/>Role's allowed IdPs?"}
MatchIdP -- No --> RejectIdP["Reject:<br/>IDPRejectedClaim"]
MatchIdP -- Yes --> FetchJWKS["5. Fetch JWKS from IdP<br/>(cached 1hr TTL, 3s timeout,<br/>stale-while-revalidate)"]
FetchJWKS --> VerifyJWT{"6. Verify JWT<br/>signature, exp, nbf, aud"}
VerifyJWT -- Invalid --> RejectJWT["Reject:<br/>InvalidIdentityToken"]
VerifyJWT -- Valid --> EvalClaims{"7. Evaluate claim<br/>constraints"}
EvalClaims -- Fail --> RejectClaims["Reject:<br/>IDPRejectedClaim"]
EvalClaims -- Pass --> ValidateDuration{"8. DurationSeconds ≤<br/>max_session_duration?"}
ValidateDuration -- No --> RejectDuration["Reject:<br/>ValidationError"]
ValidateDuration -- Yes --> GenCreds["9. Generate short-lived<br/>SigV4 credentials"]
GenCreds --> Return["Return credentials response"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the mermaid diagram is worth the extra space, but I do find it easier to understand the decision flow.

"failure_reason": null
}
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the STS log entries may include PII (e.g. assumed_by and client_ip), and that a future logging ADR will need to address retention/redaction policies.


Data providers register their upstream storage (their own S3 bucket, GCS bucket, etc.) with Source Cooperative. The proxy serves as an access control, metering, and distribution layer in front of their data.

Data providers get:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the current version of object_store compile to WASM? Or is the risk that future versions of object_store may not be compatible?

Co-authored-by: Tyler Erickson <tylerickson@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants