Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
66eb7e5
feat!: rearchitect proxy with Cloudflare Workers-compatible API
alukach Feb 19, 2026
2c0ba67
Rework to use objectstore
alukach Feb 21, 2026
9c27f5c
Support delete object
alukach Feb 21, 2026
e5b5698
Attempt to simplify by adding properties to S3Operation enum
alukach Feb 21, 2026
e314d1e
More simplification
alukach Feb 21, 2026
a8cb28e
Prefer structs over strings for xml generation
alukach Feb 21, 2026
9bee2aa
Describe reuse intentions
alukach Feb 21, 2026
db8475e
Mv source backend to be a core module. Rework backend tooling to bett…
alukach Feb 21, 2026
7f0d095
docs: buildout auth details into architecture
alukach Feb 21, 2026
a8b7ea1
feat: add OIDC provider for integration with cloud backends
alukach Feb 22, 2026
a177971
refactor: make key prefix configurable
alukach Feb 22, 2026
66a007a
refactor: auth -> sts
alukach Feb 22, 2026
17ac6e3
docs: continue to buildout architecture
alukach Feb 22, 2026
4e3cd04
fix: add chromo feature
alukach Feb 22, 2026
0ba05a4
refactor: remove streaming body conversions
alukach Feb 22, 2026
51c4d6a
refactor: use signed URLs to simplify how we stream data from various…
alukach Feb 23, 2026
5bbe049
chore: add makefile for ease of use
alukach Feb 23, 2026
54f7ea1
fix: update API response structure to list products instead of reposi…
alukach Feb 24, 2026
99167ba
chore: add source api logging
alukach Feb 24, 2026
b9280c0
fix: rename connection auth field
alukach Feb 24, 2026
aa502fe
chore: cargo fmt
alukach Feb 24, 2026
76f0ed9
chore: ignore .env
alukach Feb 24, 2026
467534b
chore: update crate and worker name
alukach Feb 24, 2026
c66ae11
chore: cargo clippy --fix
alukach Feb 24, 2026
c3c6317
chore: cargo fmt
alukach Feb 24, 2026
71594ad
chore: add helper utils
alukach Feb 24, 2026
9c37189
fix: validate sigv4 signatures
alukach Feb 24, 2026
c40509a
Fix: upload_id query string injection
alukach Feb 24, 2026
52ccabc
fix: HTTP config provider path traversal
alukach Feb 24, 2026
ee982d4
fix: Presigned URLs in debug logs
alukach Feb 24, 2026
8e98165
fix: Internal error details leaked to clients
alukach Feb 24, 2026
e93b5b6
fix: Prefix auth lacks path boundary check
alukach Feb 24, 2026
f47da3d
fix: prevent logging sensitive backend options
alukach Feb 24, 2026
5f302f4
feat: support pagination
alukach Feb 24, 2026
63f1172
feat: JWKS improvements
alukach Feb 24, 2026
5a65b71
feat: refactor to use Axom, wire in STS
alukach Feb 25, 2026
3d8b1ec
feat: add cli
alukach Feb 26, 2026
9b80406
feat: support substitutions in role policies
alukach Feb 26, 2026
f3bfd87
chore: add build helper
alukach Feb 26, 2026
91220af
Merge branch 'refactor/cf-workers-proxy'
alukach Feb 26, 2026
ed8ad7d
chore: fix cargo.lock
alukach Feb 26, 2026
9183685
chore: rm unused file
alukach Feb 26, 2026
38af07a
chore: rename modules
alukach Feb 26, 2026
2f05f24
ci: add rust tooling
alukach Feb 26, 2026
7ac0999
chore: trigger ci
alukach Feb 26, 2026
ddc4c57
chore: cargo fmt
alukach Feb 26, 2026
56a4a02
ci: add rust caching
alukach Feb 26, 2026
553e8e3
feat(cli): persist credentials
alukach Feb 26, 2026
f9bf910
fix(core): support public data connections
alukach Feb 26, 2026
1b62073
fix(api): correctly parse truthy values from config
alukach Feb 26, 2026
f6ce6db
chore(cf-workers): ignore deadcode warning
alukach Feb 26, 2026
a5c0f96
refator(core): better handle various upstream errors
alukach Feb 26, 2026
3b78af5
feat: use "sealed tokens" for sessions
alukach Feb 27, 2026
c941f01
feat: wire up oidc-provider tooling
alukach Feb 27, 2026
17d96b8
ci: add audit check
alukach Feb 27, 2026
abe26b1
chore: cargo fmt
alukach Feb 28, 2026
dde94e7
chore: clippy
alukach Feb 28, 2026
16cd5e7
ci: cargo audit
alukach Feb 28, 2026
627afdb
chore: fix subcrate dependencies
alukach Feb 28, 2026
ab2a597
chore: integrate ci checks in git hooks
alukach Feb 28, 2026
de9f607
chore: fixup makefile
alukach Feb 28, 2026
2fc7339
ci: utilize caching
alukach Feb 28, 2026
461e264
chore: speed up ci-fast
alukach Feb 28, 2026
59a217f
chore: rm CLI
alukach Mar 1, 2026
beaa3a9
feat: add VitePress documentation site (#110)
alukach Mar 1, 2026
306f284
chore: rm CLI
alukach Mar 1, 2026
37b10ab
ci: tmp disable versioning and deployment tooling
alukach Mar 1, 2026
57a72f9
ci(docs): appropriately set base path for ghpages
alukach Mar 1, 2026
bbd7377
docs: fixup
alukach Mar 2, 2026
5ec01ff
chore(getting-started): use admonition for tip
alukach Mar 2, 2026
8ea93f8
docs: utilize admonitions
alukach Mar 2, 2026
82db817
docs: customize tip and note colors
alukach Mar 2, 2026
50642b8
docs: update to adapt to cli changes
alukach Mar 2, 2026
712f6a0
refactor: improve RNG for credentials
alukach Mar 3, 2026
987fbd2
test(sts): improve testing for iam role subject glob
alukach Mar 3, 2026
40b5886
fix(sts): harden JWKS fetching with HTTPS enforcement and failure bac…
alukach Mar 3, 2026
f8e7534
fix(auth): reduce sensitive data in SigV4 mismatch logs
alukach Mar 3, 2026
1498774
docs(sealed_token): document security properties
alukach Mar 3, 2026
421ea25
perf(proxy): parse LIST query string once instead of three times
alukach Mar 3, 2026
13c82d3
perf(proxy): reduce string allocations in LIST key rewriting
alukach Mar 3, 2026
898b4de
perf(proxy): pre-allocate in path building helpers
alukach Mar 3, 2026
4e141d0
perf(proxy): use PaginatedListStore for backend-side LIST pagination
alukach Mar 3, 2026
3cecbb8
docs: object_store -> obstore
alukach Mar 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .cargo/audit.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[advisories]
# Marvin Attack: timing side-channel in rsa crate. No fix available upstream.
ignore = ["RUSTSEC-2023-0071"]
12 changes: 12 additions & 0 deletions .claude/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"version": "0.0.1",
"configurations": [
{
"name": "docs",
"runtimeExecutable": "pnpm",
"runtimeArgs": ["docs:dev"],
"port": 5173,
"cwd": "docs"
}
]
}
3 changes: 3 additions & 0 deletions .githooks/pre-commit
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/env bash
set -e
make ci-fast
75 changes: 66 additions & 9 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,75 @@ on:
push:

jobs:
fmt:
name: Format
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@v1
with:
toolchain: stable
components: rustfmt
- run: cargo fmt --check

clippy:
name: Clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@v1
with:
toolchain: stable
components: clippy
- uses: Swatinem/rust-cache@v2
- run: cargo clippy -- -D warnings

check:
name: Check
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@v1
with:
toolchain: stable
targets: wasm32-unknown-unknown
- uses: Swatinem/rust-cache@v2
- name: Check workspace
run: cargo check
- name: Check cf-workers (wasm32)
run: cargo check -p source-coop-cf-workers --target wasm32-unknown-unknown

test:
name: Test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@v1
with:
toolchain: stable
- uses: Swatinem/rust-cache@v2
- run: cargo test

audit:
name: Audit
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@v1
with:
toolchain: stable
- uses: Swatinem/rust-cache@v2
- run: cargo install cargo-audit
- run: cargo audit

build:
name: Build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@d632683dd7b4114ad314bca15554477dd762a938 # v4.2.0
- name: Set up Rust
uses: actions-rs/toolchain@8e603f32c5c6eeca5b1b2d9d1e7464d926082f1d # v1.0.0
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@v1
with:
toolchain: stable
- name: Format
run: cargo fmt --check
- name: Clippy
run: cargo clippy -- -D warnings
- name: Run tests
run: cargo test
- uses: Swatinem/rust-cache@v2
- name: Build server
run: cargo build -p source-coop-server
64 changes: 64 additions & 0 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: Deploy Docs

on:
push:
branches: [main]
paths:
- "docs/**"
- ".github/workflows/docs.yaml"

workflow_dispatch:

permissions:
contents: read
pages: write
id-token: write

concurrency:
group: pages
cancel-in-progress: false

jobs:
build:
name: Build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: pnpm/action-setup@v4
with:
version: 10

- uses: actions/setup-node@v4
with:
node-version: 22
cache: pnpm
cache-dependency-path: docs/pnpm-lock.yaml

- uses: actions/configure-pages@v5
id: pages

- name: Install dependencies
run: pnpm install --frozen-lockfile
working-directory: docs

- name: Build docs
run: pnpm docs:build
working-directory: docs
env:
VITEPRESS_BASE: ${{ steps.pages.outputs.base_path && format('{0}/', steps.pages.outputs.base_path) || '/' }}

- uses: actions/upload-pages-artifact@v3
with:
path: docs/.vitepress/dist

deploy:
name: Deploy
needs: build
runs-on: ubuntu-latest
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- uses: actions/deploy-pages@v4
id: deployment
7 changes: 4 additions & 3 deletions .github/workflows/please-release.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
name: Run release-please
on:
push:
branches:
- main
workflow_dispatch:
# push:
# branches:
# - main

permissions:
contents: write
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/staging-deploy.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
name: Deploy to Staging

on:
push:
branches:
- main
# push:
# branches:
# - main
workflow_dispatch:

permissions:
Expand Down
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
/target
.DS_Store
scripts/task_definition.json
target
.wrangler
.env*
node_modules
docs/.vitepress/cache
docs/.vitepress/dist
121 changes: 121 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Source Data Proxy Architecture

## Data Proxy

The core function of this system is to operate an S3-compliant API that proxies requests to appropriate object storage backends (e.g. MinIO, AWS S3, Cloudflare R2, Azure Blobstore).

## Runtime

The system is designed to operate in various runtime environments. Chiefly, these includes operating as a traditional server running on a Linux server or containerized environment (e.g. ECS, K8s), or running in WASM on Cloudflare Workers.

## Authentication

### How clients authenticate with Source Data Proxy

The Source Data Proxy supports two forms of authentication:

1. Custom STS + registered Identity Providers
2. Long-term Access Keys

#### Custom STS + registered Identity Providers

The Source Data Proxy hosts a replica of the AWS Security Token Service. This service is used to exchange auth tokens (JWTs) from trusted OIDC-compatible identity providers (e.g. Source Cooperative's auth, Github workflows) for temporary scoped credentials. Those credentials can be used to make authenticated access to the Source Data Proxy.

For local development and CLI usage, users can obtain temporary credentials via a `credential_process` workflow:

1. User runs an AWS CLI command (e.g. `aws s3 ls s3://bucket/ --profile source-coop`)
2. The AWS SDK invokes a configured `credential_process` CLI tool
3. The CLI tool authenticates the user with the Source Cooperative's auth provider (e.g. browser-based login)
4. Upon successful login, the CLI tool receives an OIDC JWT from the auth provider
5. The CLI tool calls the Data Proxy's STS endpoint (`AssumeRoleWithWebIdentity`) with the JWT
6. The Data Proxy validates the JWT and returns temporary scoped credentials
7. The CLI tool outputs the credentials to stdout; the AWS SDK uses them transparently

The user's `~/.aws/config` would look like:

```ini
[profile source-coop]
credential_process = source credentials # <- source cooperative cli
endpoint_url = https://data.source.coop
```

This approach reuses the existing `AssumeRoleWithWebIdentity` STS implementation and avoids the need to implement the full AWS SSO OIDC + Portal API surface (which `aws sso login` requires).

#### Long-term Access Credentials

For users that don't have access to OIDC identity providers, the Source Data Proxy can make use of long-term access keys (`AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`). User can generate and retrieve these keys from the Source application (`https://source.coop`).

### How Source Data Proxy authenticates with object storage backends

To connect with backing object storage services (e.g. MinIO, AWS S3, Cloudflare R2, Azure Blobstore)

1. Custom OIDC Provider
2. Long-term Access Keys

#### Custom OIDC Provider

The Source Data Proxy operates as a custom OIDC Provider. Users can register this provider with their cloud environments. When the Source Data Proxy needs to connect with an object storage backend, it will generate a JWT signed with the Data Proxy's OIDC provider and use it to retrieve a set of temporary scoped credentials. To reduce latency, these credentials will be cached by the Source Data Proxy for reuse on subsequent requests. This process is akin to how Github or Vercel authenticates with AWS[^vercel-oidc][^github-oidc].

The proxy's OIDC discovery endpoints (`/.well-known/openid-configuration` and JWKS) must be publicly accessible, as cloud providers fetch them at token validation time to verify JWT signatures.

<details>

<summary>Cloud Provider Integration Workflows</summary>

##### AWS (S3)

**Administrator setup:**

1. Register the proxy's issuer URL (e.g. `https://data.source.coop`) as an IAM OIDC Identity Provider in the AWS account.
2. Create an IAM Role with a trust policy allowing `sts:AssumeRoleWithWebIdentity` from the provider, scoped by `aud` and `sub` claim conditions.
3. Attach a permission policy granting the necessary S3 access.

**At request time:**

1. The proxy mints a JWT with `iss: https://data.source.coop`, `sub: <connection-identifier>`, and `aud: sts.amazonaws.com`.
2. The proxy calls `AssumeRoleWithWebIdentity` on AWS STS with the JWT and the target Role ARN. This call does not require AWS credentials — the JWT is the sole authentication.
3. AWS validates the JWT (fetches JWKS, checks signature, evaluates trust policy conditions) and returns temporary `AccessKeyId` / `SecretAccessKey` / `SessionToken` credentials.
4. The proxy caches and passes these credentials to `AmazonS3Builder`.

##### Azure (Blob Storage)

**Administrator setup:**

1. Create an App Registration (or User-Assigned Managed Identity) in Microsoft Entra ID.
2. Add a Federated Identity Credential specifying the proxy's issuer URL and the expected `sub` claim value.
3. Grant the app registration a role assignment on the target storage account (e.g. `Storage Blob Data Contributor`).

**At request time:**

1. The proxy mints a JWT with `iss: https://data.source.coop`, `sub: <connection-identifier>`, and `aud: api://AzureADTokenExchange`.
2. The proxy exchanges the JWT for an Azure AD access token via the Microsoft identity platform token endpoint using `grant_type=client_credentials` with `client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer`. The JWT replaces a client secret.
3. Azure validates the JWT against the federated identity credential configuration and returns an OAuth 2.0 bearer token scoped to Azure Storage.
4. The proxy caches and passes the bearer token to `MicrosoftAzureBuilder`.

##### GCP (Cloud Storage)

**Administrator setup:**

1. Create a Workload Identity Pool and an OIDC Provider within it, specifying the proxy's issuer URL and an attribute mapping (e.g. `google.subject = assertion.sub`).
2. Grant the mapped external identity `roles/iam.workloadIdentityUser` on a GCP Service Account.
3. Grant the service account the necessary GCS permissions.

**At request time (two-step exchange):**

1. The proxy mints a JWT with `iss: https://data.source.coop`, `sub: <connection-identifier>`, and `aud` set to the Workload Identity Provider's full resource name.
2. The proxy calls the GCP STS endpoint (`sts.googleapis.com/v1/token`) with an RFC 8693 token exchange request, submitting the JWT as the subject token. GCP returns a federated access token.
3. The proxy uses the federated token to call the IAM Credentials API (`generateAccessToken`) to impersonate the service account, obtaining a short-lived OAuth 2.0 access token.
4. The proxy caches and passes the access token to `GoogleCloudStorageBuilder` via a custom `CredentialProvider`.

</details>

#### Long-term Access Credentials

For object storage backends that are unable to utilize the Source Data Proxy as an Identity Provider, the Data Proxy also stores long-term access credentials provided by the administrators of the object storage backend. These credentials will be used to authenticate when the Data Proxy needs to interact with the object storage backend.

[^vercel-oidc]: https://vercel.com/docs/oidc/aws
[^github-oidc]: https://docs.github.com/en/actions/concepts/security/openid-connect

## Modularity

The primary focus of this codebase is to serve as a data proxy for the [Source Cooperative](https://source.coop). However, it is built in a modular fashion to support reuse by others who have similar needs.
Loading