Overview
This RFC proposes migrating our API from the current REST-ish design to an RPC-style architecture. After reviewing the current API and our domain requirements, RPC appears to be a better fit.
Feedback welcome - please comment with any concerns or questions.
Current State
The existing API uses a mixed pattern:
- REST-like resource endpoints (
/projects, /data-files, /harmonization-rules)
- Action endpoints that don't fit REST cleanly (
/data-files/{id}/harmonize, /data-dictionaries/{id}/extract-data-elements)
This inconsistency makes the API harder to learn and use.
Why RPC?
Our Domain is Action-Oriented
The core of what we do is operations on data:
- Harmonize a dataset
- Validate rules
- Preview transformations
- Replay a transformation log
- Extract elements from a dictionary
These are verbs, not nouns. REST's resource-centric model requires awkward mappings like POST /data-files/{id}/harmonize - which is really just RPC with extra steps.
Simpler Client Code
With RPC, every operation follows the same pattern:
client.call("createProject", name="My Study")
client.call("uploadFile", file=data, project_id="...")
client.call("harmonize", file_id="...", rule_ids=["..."])
No need to remember which HTTP verb to use or how to construct resource URLs.
Natural Batching
RPC makes batch operations trivial:
client.call("batch", requests=[
{"method": "uploadFile", "params": {...}},
{"method": "harmonize", "params": {...}}
])
Our Consumers Benefit
Our primary API consumers are:
- Internal frontend
- Jupyter notebooks / Python scripts
- Backend services
All of these benefit more from a simple, consistent calling convention than from REST's HTTP semantics.
Proposed Design
Single Endpoint
Request Format
{
"method": "methodName",
"params": { ... }
}
Response Format
Success:
{
"status": "success",
"result": { ... }
}
Error:
{
"status": "error",
"error": {
"code": "VALIDATION_ERROR",
"message": "Human readable message",
"details": [...]
}
}
Async (long-running operations):
{
"status": "accepted",
"job_id": "job-123"
}
Proposed Methods
Resource Operations
| Method |
Description |
listProjects, getProject, createProject, updateProject, deleteProject |
Project management |
listFiles, getFile, uploadFile, deleteFile, downloadFile |
Data file management |
listDictionaries, getDictionary, uploadDictionary, deleteDictionary |
Dictionary management |
listElements, getElement, createElement, deleteElement |
Element management |
listRules, getRule, createRule, updateRule, deleteRule |
Rule management |
Domain Actions
| Method |
Description |
harmonize |
Apply rules to transform a dataset (async) |
harmonizePreview |
Preview transformation without saving |
validateRules |
Check rules for errors |
validateDictionary |
Validate dictionary structure |
extractElements |
Parse dictionary and create element records |
replayLog |
Re-run transformations from a log (async) |
compareDatasets |
Diff two datasets |
Infrastructure
| Method |
Description |
batch |
Execute multiple operations in one request |
getJob, cancelJob, listJobs |
Manage async operations |
Error Codes
| Category |
Codes |
| Validation |
VALIDATION_ERROR, INVALID_FORMAT, MISSING_FIELD |
| Not Found |
NOT_FOUND, FILE_NOT_FOUND, RULE_NOT_FOUND |
| Conflict |
ALREADY_EXISTS, VERSION_CONFLICT |
| Domain |
HARMONIZATION_FAILED, INVALID_RULE, INCOMPATIBLE_TYPES |
| Server |
INTERNAL_ERROR, STORAGE_ERROR |
Migration Path
- Implement new
/api RPC endpoint alongside existing REST endpoints
- Update clients to use new RPC endpoint
- Deprecate old REST endpoints
- Remove old endpoints after transition period
Questions for Discussion
- Are there any use cases where the current REST endpoints work better?
- Any concerns about the proposed method naming?
- Preferences on sync vs async for specific operations?
Please share your thoughts in the comments.
Overview
This RFC proposes migrating our API from the current REST-ish design to an RPC-style architecture. After reviewing the current API and our domain requirements, RPC appears to be a better fit.
Feedback welcome - please comment with any concerns or questions.
Current State
The existing API uses a mixed pattern:
/projects,/data-files,/harmonization-rules)/data-files/{id}/harmonize,/data-dictionaries/{id}/extract-data-elements)This inconsistency makes the API harder to learn and use.
Why RPC?
Our Domain is Action-Oriented
The core of what we do is operations on data:
These are verbs, not nouns. REST's resource-centric model requires awkward mappings like
POST /data-files/{id}/harmonize- which is really just RPC with extra steps.Simpler Client Code
With RPC, every operation follows the same pattern:
No need to remember which HTTP verb to use or how to construct resource URLs.
Natural Batching
RPC makes batch operations trivial:
Our Consumers Benefit
Our primary API consumers are:
All of these benefit more from a simple, consistent calling convention than from REST's HTTP semantics.
Proposed Design
Single Endpoint
Request Format
{ "method": "methodName", "params": { ... } }Response Format
Success:
{ "status": "success", "result": { ... } }Error:
{ "status": "error", "error": { "code": "VALIDATION_ERROR", "message": "Human readable message", "details": [...] } }Async (long-running operations):
{ "status": "accepted", "job_id": "job-123" }Proposed Methods
Resource Operations
listProjects,getProject,createProject,updateProject,deleteProjectlistFiles,getFile,uploadFile,deleteFile,downloadFilelistDictionaries,getDictionary,uploadDictionary,deleteDictionarylistElements,getElement,createElement,deleteElementlistRules,getRule,createRule,updateRule,deleteRuleDomain Actions
harmonizeharmonizePreviewvalidateRulesvalidateDictionaryextractElementsreplayLogcompareDatasetsInfrastructure
batchgetJob,cancelJob,listJobsError Codes
VALIDATION_ERROR,INVALID_FORMAT,MISSING_FIELDNOT_FOUND,FILE_NOT_FOUND,RULE_NOT_FOUNDALREADY_EXISTS,VERSION_CONFLICTHARMONIZATION_FAILED,INVALID_RULE,INCOMPATIBLE_TYPESINTERNAL_ERROR,STORAGE_ERRORMigration Path
/apiRPC endpoint alongside existing REST endpointsQuestions for Discussion
Please share your thoughts in the comments.