Skip to content

RFC: API Design - Migrate to RPC-style architecture #27

@matthewhorridge

Description

@matthewhorridge

Overview

This RFC proposes migrating our API from the current REST-ish design to an RPC-style architecture. After reviewing the current API and our domain requirements, RPC appears to be a better fit.

Feedback welcome - please comment with any concerns or questions.

Current State

The existing API uses a mixed pattern:

  • REST-like resource endpoints (/projects, /data-files, /harmonization-rules)
  • Action endpoints that don't fit REST cleanly (/data-files/{id}/harmonize, /data-dictionaries/{id}/extract-data-elements)

This inconsistency makes the API harder to learn and use.

Why RPC?

Our Domain is Action-Oriented

The core of what we do is operations on data:

  • Harmonize a dataset
  • Validate rules
  • Preview transformations
  • Replay a transformation log
  • Extract elements from a dictionary

These are verbs, not nouns. REST's resource-centric model requires awkward mappings like POST /data-files/{id}/harmonize - which is really just RPC with extra steps.

Simpler Client Code

With RPC, every operation follows the same pattern:

client.call("createProject", name="My Study")
client.call("uploadFile", file=data, project_id="...")
client.call("harmonize", file_id="...", rule_ids=["..."])

No need to remember which HTTP verb to use or how to construct resource URLs.

Natural Batching

RPC makes batch operations trivial:

client.call("batch", requests=[
    {"method": "uploadFile", "params": {...}},
    {"method": "harmonize", "params": {...}}
])

Our Consumers Benefit

Our primary API consumers are:

  • Internal frontend
  • Jupyter notebooks / Python scripts
  • Backend services

All of these benefit more from a simple, consistent calling convention than from REST's HTTP semantics.

Proposed Design

Single Endpoint

POST /api

Request Format

{
  "method": "methodName",
  "params": { ... }
}

Response Format

Success:

{
  "status": "success",
  "result": { ... }
}

Error:

{
  "status": "error",
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Human readable message",
    "details": [...]
  }
}

Async (long-running operations):

{
  "status": "accepted",
  "job_id": "job-123"
}

Proposed Methods

Resource Operations

Method Description
listProjects, getProject, createProject, updateProject, deleteProject Project management
listFiles, getFile, uploadFile, deleteFile, downloadFile Data file management
listDictionaries, getDictionary, uploadDictionary, deleteDictionary Dictionary management
listElements, getElement, createElement, deleteElement Element management
listRules, getRule, createRule, updateRule, deleteRule Rule management

Domain Actions

Method Description
harmonize Apply rules to transform a dataset (async)
harmonizePreview Preview transformation without saving
validateRules Check rules for errors
validateDictionary Validate dictionary structure
extractElements Parse dictionary and create element records
replayLog Re-run transformations from a log (async)
compareDatasets Diff two datasets

Infrastructure

Method Description
batch Execute multiple operations in one request
getJob, cancelJob, listJobs Manage async operations

Error Codes

Category Codes
Validation VALIDATION_ERROR, INVALID_FORMAT, MISSING_FIELD
Not Found NOT_FOUND, FILE_NOT_FOUND, RULE_NOT_FOUND
Conflict ALREADY_EXISTS, VERSION_CONFLICT
Domain HARMONIZATION_FAILED, INVALID_RULE, INCOMPATIBLE_TYPES
Server INTERNAL_ERROR, STORAGE_ERROR

Migration Path

  1. Implement new /api RPC endpoint alongside existing REST endpoints
  2. Update clients to use new RPC endpoint
  3. Deprecate old REST endpoints
  4. Remove old endpoints after transition period

Questions for Discussion

  1. Are there any use cases where the current REST endpoints work better?
  2. Any concerns about the proposed method naming?
  3. Preferences on sync vs async for specific operations?

Please share your thoughts in the comments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions