You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This design introduces SeaweedFS as a self-hosted S3-compatible storage backend for the PDR AI platform, alongside the existing Vercel Blob and UploadThing cloud providers. The core change is a unified Storage Adapter (src/lib/storage.ts) that abstracts all storage operations behind a single interface, routing to the correct backend based on the NEXT_PUBLIC_STORAGE_PROVIDER environment variable.
The architecture is designed around three parallel workstreams:
Strategy Pattern for Storage: Rather than scattering if (provider === 'local') checks throughout the codebase, we use a strategy pattern where each provider implements a common interface. The adapter selects the strategy at initialization time.
Presigned URLs for Local Uploads: Browser uploads go directly to SeaweedFS via presigned URLs, avoiding server-side proxying and keeping the upload flow consistent with cloud providers.
fetchBlob Abstraction Extension: The existing fetchBlob function in vercel-blob.ts is the de facto fetch wrapper for all document retrieval. We extend the Storage Adapter to provide a unified fetchFile that handles SeaweedFS URLs alongside Vercel Blob URLs.
No Schema Migration Required: The fileUploads.storageProvider column is varchar(64), so adding seaweedfs as a value requires no DDL changes.
Architecture
graph TB
subgraph Frontend
UC[Upload Component]
DC[Document Viewer]
end
subgraph API Layer
BA[Bootstrap API<br>/api/employer/upload/bootstrap]
PS[Presign API<br>/api/storage/presign]
UL[Upload-Local API<br>/api/upload-local]
UD[Upload Document API<br>/api/uploadDocument]
end
subgraph Storage Adapter Layer
SA[Storage Adapter<br>src/lib/storage.ts]
CP[Cloud Provider<br>Vercel Blob / UploadThing]
LP[Local Provider<br>S3 Client → SeaweedFS]
end
subgraph Infrastructure
SW[SeaweedFS Container<br>S3 Gateway :8333]
PG[(PostgreSQL<br>pdr_ai_v2 + seaweedfs DBs)]
VOL[Docker Volume<br>seaweedfs_data]
end
UC -->|cloud mode| CP
UC -->|local mode: get presigned URL| PS
UC -->|local mode: PUT to S3| SW
UC -->|both modes: trigger OCR| UD
DC -->|fetch| SA
BA -->|storageProvider config| UC
PS -->|generate presigned URL| LP
UL -->|upload via adapter| SA
SA -->|cloud| CP
SA -->|local| LP
LP --> SW
SW --> PG
SW --> VOL
The existing UploadForm already has a storageMethod concept ("cloud" vs "database"). We extend this to support a third mode: "local" (S3/SeaweedFS). When storageProvider === "local" from bootstrap:
The component fetches a presigned URL from /api/storage/presign
Uploads the file directly to SeaweedFS via fetch(presignedUrl, { method: 'PUT', body: file })
Calls /api/uploadDocument with the S3 URL to trigger the OCR pipeline
Progress tracking uses XMLHttpRequest for upload progress events
6. Document Retrieval Extension
The fetchFile function in the Storage Adapter replaces direct fetchBlob calls. For SeaweedFS documents, it constructs the URL from NEXT_PUBLIC_S3_ENDPOINT + storagePathname. Existing fetchBlob calls in OCR adapters and ingestion router are updated to use fetchFile.
A superRefine is added to validate that when NEXT_PUBLIC_STORAGE_PROVIDER === "local", the S3 variables are all present.
fileUploads Table (Existing — No Migration)
Column
Type
Notes
storageProvider
varchar(64)
Existing values: database, vercel_blob. New value: seaweedfs
storageUrl
varchar(1024)
Full S3 URL: http://<endpoint>:8333/<bucket>/<key>
storagePathname
varchar(1024)
S3 object key: documents/<uuid>-<filename>
S3 Object Key Format
documents/{uuid}-{sanitized_filename}
Same pattern as the existing Vercel Blob key format in putFile(), ensuring consistency.
Docker: SeaweedFS Filer Database (init-db.sql)
The init-db.sql script creates the seaweedfs database and the filemeta table required by the [postgres] filer store driver.
Note: The [postgres2] driver (originally planned) has a known SQL formatting bug in current SeaweedFS versions (tested: 3.68 and latest/4.15) where it generates invalid SQL (%!(EXTRA string=filemeta)) during table initialization. We use the [postgres] driver instead, which requires the filemeta table to be pre-created.
CREATEDATABASEseaweedfs;
\c seaweedfs
CREATETABLEIF NOT EXISTS filemeta (
dirhash BIGINT,
name VARCHAR(65535),
directory VARCHAR(65535),
meta bytea,
PRIMARY KEY (dirhash, name)
);
This only affects SeaweedFS's own metadata storage — the app's pdr_ai_v2 database and its Drizzle ORM connection are completely unaffected.
The filer config uses the [postgres] section. (The [postgres2] driver is the newer recommended driver but has a SQL formatting bug in current SeaweedFS releases — see init-db.sql note above.)
Important: The password field must match the POSTGRES_PASSWORD env var in docker-compose.yml (default: password). If you change the PostgreSQL password, update filer.toml to match — it does not support env var interpolation.
The S3_ACCESS_KEY and S3_SECRET_KEY env vars used by the app's S3 client must match the credentials defined here.
Correctness Properties
A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.
Property 1: Storage provider enum validation
For any string value assigned to NEXT_PUBLIC_STORAGE_PROVIDER, the Zod schema should accept only "cloud" and "local", and reject all other strings. When the variable is absent, it should default to "cloud".
Validates: Requirements 2.1
Property 2: Conditional S3 variable requirement
For any environment configuration where NEXT_PUBLIC_STORAGE_PROVIDER is "local", if any of NEXT_PUBLIC_S3_ENDPOINT, S3_REGION, S3_ACCESS_KEY, S3_SECRET_KEY, or S3_BUCKET_NAME is missing or empty, the Zod validation should fail. When provider is "cloud", these variables should not be required.
Validates: Requirements 2.2, 2.4
Property 3: Presigned URL structure
For any valid S3 object key and content type, the generated presigned upload URL should contain the configured endpoint, bucket name, object key, and a signature query parameter (X-Amz-Signature).
For any endpoint string, when the S3 client fails to connect, the thrown error message should contain the endpoint URL so operators can diagnose the issue.
Validates: Requirements 3.4
Property 5: Upload result shape and persistence consistency
For any valid upload input (filename, data, contentType) and for any active storage provider, the uploadFile function should return an UploadResult containing non-empty url, pathname, and provider fields, and the corresponding fileUploads DB record should have storageProvider matching the active provider, storageUrl matching the returned url, and storagePathname matching the returned pathname.
Validates: Requirements 4.4, 4.5, 10.2
Property 6: Upload error propagation
For any error thrown by any storage provider during upload, the error propagated by the Storage Adapter should contain both the provider name (e.g., "seaweedfs", "vercel_blob") and the original error message.
For any request to POST /api/storage/presign without a valid Clerk authentication token, the endpoint should return a 401 status code regardless of the request body content.
Validates: Requirements 5.2
Property 8: Presign response completeness
For any valid presign request (with valid filename and contentType) when storage provider is "local", the response body should contain presignedUrl (non-empty string), objectKey (non-empty string), and bucket (non-empty string).
Validates: Requirements 5.5
Property 9: Mixed-provider document retrieval
For any list of documents with mixed storageProvider values (seaweedfs, vercel_blob, uploadthing), the fetchFile function should correctly resolve each document's URL based on its provider: SeaweedFS documents use NEXT_PUBLIC_S3_ENDPOINT + storagePathname, while cloud documents use their existing retrieval logic.
For any document with storageProvider === "seaweedfs", when the SeaweedFS service is unreachable, the error should indicate that the local storage service is unavailable and include the configured endpoint.
Validates: Requirements 7.4
Property 11: Bootstrap API storage provider reporting
For any value of NEXT_PUBLIC_STORAGE_PROVIDER (including unset), the bootstrap API response should include a storageProvider field that equals the configured value or defaults to "cloud", and should always include the isUploadThingConfigured field for backward compatibility.
Validates: Requirements 9.1, 9.3
Error Handling
Storage Adapter Errors
Scenario
Behavior
SeaweedFS unreachable on upload
Throw StorageError with provider name, endpoint, and original connection error
SeaweedFS unreachable on retrieval
Return descriptive error: "Local storage service unavailable at {endpoint}"
Presigned URL generation fails
Throw with S3 client error details and configured endpoint
Invalid storage provider in env
Zod validation fails at startup with clear message about valid values
Missing S3 variables for local mode
Zod superRefine fails with list of missing required variables
Cloud provider upload fails
Propagate original UploadThing/Vercel Blob error with provider name prefix
Presign endpoint called in cloud mode
Return 400 with message: "Presigned URLs are not applicable for cloud storage"
Unauthenticated presign request
Return 401 with message: "Authentication required"
If NEXT_PUBLIC_STORAGE_PROVIDER is not set, the system defaults to "cloud" mode — no breaking change for existing deployments.
The SeaweedFS Docker service uses the local-storage profile, so it only starts when explicitly requested (docker compose --profile local-storage up).
The fetchFile function checks the document's storageProvider field to determine retrieval strategy, so mixed-provider libraries work without configuration changes.
Testing Strategy
Property-Based Testing
Library: fast-check (already in devDependencies)
Each correctness property maps to a single property-based test with minimum 100 iterations. Tests are tagged with the format:
Env validation properties (Properties 1, 2): Generate random strings for NEXT_PUBLIC_STORAGE_PROVIDER and random presence/absence of S3 variables. Assert Zod schema accepts/rejects correctly.
Presigned URL structure (Property 3): Generate random object keys and content types. Assert URL contains expected components.
Upload result consistency (Property 5): Generate random filenames, data buffers, content types. Mock both providers. Assert result shape and DB record correctness.
Error propagation (Property 6): Generate random error messages and provider names. Assert propagated error contains both.
Auth enforcement (Property 7): Generate random request bodies. Assert 401 without auth.
Presign response completeness (Property 8): Generate random filenames and content types. Assert response contains all required fields.
Mixed-provider retrieval (Property 9): Generate random document lists with mixed providers. Assert correct URL resolution per provider.
Design Document: Local S3 Migration
Overview
This design introduces SeaweedFS as a self-hosted S3-compatible storage backend for the PDR AI platform, alongside the existing Vercel Blob and UploadThing cloud providers. The core change is a unified Storage Adapter (
src/lib/storage.ts) that abstracts all storage operations behind a single interface, routing to the correct backend based on theNEXT_PUBLIC_STORAGE_PROVIDERenvironment variable.The architecture is designed around three parallel workstreams:
Key Design Decisions
if (provider === 'local')checks throughout the codebase, we use a strategy pattern where each provider implements a common interface. The adapter selects the strategy at initialization time.fetchBlobAbstraction Extension: The existingfetchBlobfunction invercel-blob.tsis the de facto fetch wrapper for all document retrieval. We extend the Storage Adapter to provide a unifiedfetchFilethat handles SeaweedFS URLs alongside Vercel Blob URLs.fileUploads.storageProvidercolumn isvarchar(64), so addingseaweedfsas a value requires no DDL changes.Architecture
graph TB subgraph Frontend UC[Upload Component] DC[Document Viewer] end subgraph API Layer BA[Bootstrap API<br>/api/employer/upload/bootstrap] PS[Presign API<br>/api/storage/presign] UL[Upload-Local API<br>/api/upload-local] UD[Upload Document API<br>/api/uploadDocument] end subgraph Storage Adapter Layer SA[Storage Adapter<br>src/lib/storage.ts] CP[Cloud Provider<br>Vercel Blob / UploadThing] LP[Local Provider<br>S3 Client → SeaweedFS] end subgraph Infrastructure SW[SeaweedFS Container<br>S3 Gateway :8333] PG[(PostgreSQL<br>pdr_ai_v2 + seaweedfs DBs)] VOL[Docker Volume<br>seaweedfs_data] end UC -->|cloud mode| CP UC -->|local mode: get presigned URL| PS UC -->|local mode: PUT to S3| SW UC -->|both modes: trigger OCR| UD DC -->|fetch| SA BA -->|storageProvider config| UC PS -->|generate presigned URL| LP UL -->|upload via adapter| SA SA -->|cloud| CP SA -->|local| LP LP --> SW SW --> PG SW --> VOLRequest Flow: Local Upload
sequenceDiagram participant Browser participant BootstrapAPI participant PresignAPI participant SeaweedFS participant UploadDocAPI participant DB Browser->>BootstrapAPI: GET /api/employer/upload/bootstrap BootstrapAPI-->>Browser: { storageProvider: "local", s3Endpoint: "..." } Browser->>PresignAPI: POST /api/storage/presign { filename, contentType } PresignAPI-->>Browser: { presignedUrl, objectKey, bucket } Browser->>SeaweedFS: PUT presignedUrl (file binary) SeaweedFS-->>Browser: 200 OK Browser->>UploadDocAPI: POST /api/uploadDocument { documentUrl, storageType: "local" } UploadDocAPI->>DB: Insert fileUploads (storageProvider: "seaweedfs") UploadDocAPI-->>Browser: { jobId, success: true }Components and Interfaces
1. S3 Client Module (
src/server/storage/s3-client.ts)Singleton S3 client configured for SeaweedFS. Only instantiated when
NEXT_PUBLIC_STORAGE_PROVIDER === 'local'.2. Storage Adapter (
src/lib/storage.ts)Unified interface that delegates to the correct provider.
3. Presigned URL API Route (
src/app/api/storage/presign/route.ts)4. Bootstrap API Extension (
src/app/api/employer/upload/bootstrap/route.ts)Extends the existing
UploadBootstrapResponsetype:5. Upload Component Changes (
src/app/employer/upload/UploadForm.tsx)The existing
UploadFormalready has astorageMethodconcept ("cloud"vs"database"). We extend this to support a third mode:"local"(S3/SeaweedFS). WhenstorageProvider === "local"from bootstrap:/api/storage/presignfetch(presignedUrl, { method: 'PUT', body: file })/api/uploadDocumentwith the S3 URL to trigger the OCR pipelineXMLHttpRequestfor upload progress events6. Document Retrieval Extension
The
fetchFilefunction in the Storage Adapter replaces directfetchBlobcalls. For SeaweedFS documents, it constructs the URL fromNEXT_PUBLIC_S3_ENDPOINT+storagePathname. ExistingfetchBlobcalls in OCR adapters and ingestion router are updated to usefetchFile.7. Docker Infrastructure
New additions to
docker-compose.yml:Key command flags:
-dir=/data— explicit volume storage directory (required, otherwise volumes can't be allocated)-volume.max=0— unlimited volume count (low values like 5 get exhausted by internal metadata volumes)-master.volumeSizeLimitMB=1024— 1GB volumes instead of default 30GB (dev-friendly)-s3.config=/etc/seaweedfs/s3-config.json— S3 credential config (required, newer SeaweedFS denies anonymous access by default)-volume.port=8080— explicit volume server port to avoid conflictsNew files:
docker/filer.toml— SeaweedFS filer config pointing to PostgreSQLdocker/s3-config.json— S3 identity/credential config for SeaweedFS (default: accessKeypdr_local_key, secretKeypdr_local_secret)docker/init-db.sql— Extended to createseaweedfsdatabase with filer schema tablesData Models
Environment Variables (Storage_Config)
NEXT_PUBLIC_STORAGE_PROVIDERcloudNEXT_PUBLIC_S3_ENDPOINTlocalS3_REGIONlocalus-east-1S3_ACCESS_KEYlocalS3_SECRET_KEYlocalS3_BUCKET_NAMElocalZod Schema Extension (
src/env.ts)Server schema additions:
Client schema additions:
A
superRefineis added to validate that whenNEXT_PUBLIC_STORAGE_PROVIDER === "local", the S3 variables are all present.fileUploads Table (Existing — No Migration)
storageProvidervarchar(64)database,vercel_blob. New value:seaweedfsstorageUrlvarchar(1024)http://<endpoint>:8333/<bucket>/<key>storagePathnamevarchar(1024)documents/<uuid>-<filename>S3 Object Key Format
Same pattern as the existing Vercel Blob key format in
putFile(), ensuring consistency.Docker: SeaweedFS Filer Database (init-db.sql)
The
init-db.sqlscript creates theseaweedfsdatabase and thefilemetatable required by the[postgres]filer store driver.This only affects SeaweedFS's own metadata storage — the app's
pdr_ai_v2database and its Drizzle ORM connection are completely unaffected.Docker: SeaweedFS Filer Configuration (docker/filer.toml)
The filer config uses the
[postgres]section. (The[postgres2]driver is the newer recommended driver but has a SQL formatting bug in current SeaweedFS releases — see init-db.sql note above.)Docker: SeaweedFS S3 Credentials (docker/s3-config.json)
S3 access credentials for the SeaweedFS S3 gateway. Newer SeaweedFS versions deny anonymous write access by default, so this config is required.
{ "identities": [ { "name": "pdr_admin", "credentials": [ { "accessKey": "pdr_local_key", "secretKey": "pdr_local_secret" } ], "actions": ["Admin", "Read", "List", "Tagging", "Write", "WriteAcp", "ReadAcp"] }, { "name": "anonymous", "actions": ["Read"] } ] }The
S3_ACCESS_KEYandS3_SECRET_KEYenv vars used by the app's S3 client must match the credentials defined here.Correctness Properties
A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.
Property 1: Storage provider enum validation
For any string value assigned to
NEXT_PUBLIC_STORAGE_PROVIDER, the Zod schema should accept only"cloud"and"local", and reject all other strings. When the variable is absent, it should default to"cloud".Validates: Requirements 2.1
Property 2: Conditional S3 variable requirement
For any environment configuration where
NEXT_PUBLIC_STORAGE_PROVIDERis"local", if any ofNEXT_PUBLIC_S3_ENDPOINT,S3_REGION,S3_ACCESS_KEY,S3_SECRET_KEY, orS3_BUCKET_NAMEis missing or empty, the Zod validation should fail. When provider is"cloud", these variables should not be required.Validates: Requirements 2.2, 2.4
Property 3: Presigned URL structure
For any valid S3 object key and content type, the generated presigned upload URL should contain the configured endpoint, bucket name, object key, and a signature query parameter (
X-Amz-Signature).Validates: Requirements 3.3
Property 4: S3 client connection error descriptiveness
For any endpoint string, when the S3 client fails to connect, the thrown error message should contain the endpoint URL so operators can diagnose the issue.
Validates: Requirements 3.4
Property 5: Upload result shape and persistence consistency
For any valid upload input (filename, data, contentType) and for any active storage provider, the
uploadFilefunction should return anUploadResultcontaining non-emptyurl,pathname, andproviderfields, and the correspondingfileUploadsDB record should havestorageProvidermatching the active provider,storageUrlmatching the returnedurl, andstoragePathnamematching the returnedpathname.Validates: Requirements 4.4, 4.5, 10.2
Property 6: Upload error propagation
For any error thrown by any storage provider during upload, the error propagated by the Storage Adapter should contain both the provider name (e.g.,
"seaweedfs","vercel_blob") and the original error message.Validates: Requirements 4.6
Property 7: Presign endpoint authentication enforcement
For any request to
POST /api/storage/presignwithout a valid Clerk authentication token, the endpoint should return a 401 status code regardless of the request body content.Validates: Requirements 5.2
Property 8: Presign response completeness
For any valid presign request (with valid filename and contentType) when storage provider is
"local", the response body should containpresignedUrl(non-empty string),objectKey(non-empty string), andbucket(non-empty string).Validates: Requirements 5.5
Property 9: Mixed-provider document retrieval
For any list of documents with mixed
storageProvidervalues (seaweedfs,vercel_blob,uploadthing), thefetchFilefunction should correctly resolve each document's URL based on its provider: SeaweedFS documents useNEXT_PUBLIC_S3_ENDPOINT+storagePathname, while cloud documents use their existing retrieval logic.Validates: Requirements 7.1, 7.3
Property 10: SeaweedFS retrieval error descriptiveness
For any document with
storageProvider === "seaweedfs", when the SeaweedFS service is unreachable, the error should indicate that the local storage service is unavailable and include the configured endpoint.Validates: Requirements 7.4
Property 11: Bootstrap API storage provider reporting
For any value of
NEXT_PUBLIC_STORAGE_PROVIDER(including unset), the bootstrap API response should include astorageProviderfield that equals the configured value or defaults to"cloud", and should always include theisUploadThingConfiguredfield for backward compatibility.Validates: Requirements 9.1, 9.3
Error Handling
Storage Adapter Errors
StorageErrorwith provider name, endpoint, and original connection errorsuperRefinefails with list of missing required variablesError Class
Graceful Degradation
NEXT_PUBLIC_STORAGE_PROVIDERis not set, the system defaults to"cloud"mode — no breaking change for existing deployments.local-storageprofile, so it only starts when explicitly requested (docker compose --profile local-storage up).fetchFilefunction checks the document'sstorageProviderfield to determine retrieval strategy, so mixed-provider libraries work without configuration changes.Testing Strategy
Property-Based Testing
Library: fast-check (already in devDependencies)
Each correctness property maps to a single property-based test with minimum 100 iterations. Tests are tagged with the format:
Property tests to implement:
NEXT_PUBLIC_STORAGE_PROVIDERand random presence/absence of S3 variables. Assert Zod schema accepts/rejects correctly.Unit Tests
Unit tests complement property tests for specific examples and edge cases:
s3Endpointonly when provider is"local"(Req 9.2)"cloud"(Req 5.4)putFile(Req 4.2)/api/uploadDocumentafter successful upload in both modes (Req 6.5)fileUploadstable accepts"seaweedfs"asstorageProvidervalue (Req 10.1)Integration Tests (Manual / CI)
--profile local-storageand SeaweedFS is reachable on port 8333Parallel Workstream Testing
The three workstreams can be tested independently: