|
| 1 | +--- |
| 2 | +title: "API Contract" |
| 3 | +description: "Request and response schemas for the topic modeling worker — field definitions, types, and examples." |
| 4 | +--- |
| 5 | + |
| 6 | +**Source of truth:** `api.faculytics/src/modules/analysis/dto/topic-model-worker.dto.ts` (Zod schemas) |
| 7 | + |
| 8 | +**Worker schemas:** `src/models.py` (Pydantic, must stay in sync with Zod) |
| 9 | + |
| 10 | +## Endpoint |
| 11 | + |
| 12 | +`POST {TOPIC_MODEL_WORKER_URL}` |
| 13 | + |
| 14 | +When deployed on RunPod, the actual endpoint is: |
| 15 | + |
| 16 | +``` |
| 17 | +POST https://api.runpod.ai/v2/<endpoint-id>/runsync |
| 18 | +Headers: { Authorization: Bearer <RUNPOD_API_KEY> } |
| 19 | +Body: { input: <request payload> } |
| 20 | +``` |
| 21 | + |
| 22 | +The RunPod envelope (`input` wrapper, `output` unwrapping) is handled by the API's `RunPodBatchProcessor`. |
| 23 | + |
| 24 | +## Request |
| 25 | + |
| 26 | +```json |
| 27 | +{ |
| 28 | + "items": [ |
| 29 | + { |
| 30 | + "submissionId": "uuid-string", |
| 31 | + "text": "The pace was too fast, couldn't follow along.", |
| 32 | + "embedding": [0.123, -0.456, 0.789, "... (768 floats)"] |
| 33 | + } |
| 34 | + ], |
| 35 | + "params": { |
| 36 | + "min_topic_size": 15, |
| 37 | + "nr_topics": 20, |
| 38 | + "umap_n_neighbors": 20, |
| 39 | + "umap_n_components": 10 |
| 40 | + } |
| 41 | +} |
| 42 | +``` |
| 43 | + |
| 44 | +### Request Fields |
| 45 | + |
| 46 | +| Field | Type | Required | Default | Description | |
| 47 | +| --- | --- | --- | --- | --- | |
| 48 | +| `items` | array | Yes | — | Submissions that passed the sentiment gate | |
| 49 | +| `items[].submissionId` | string | Yes | — | Unique submission identifier | |
| 50 | +| `items[].text` | string | Yes | — | Pre-cleaned qualitative comment (`cleanedComment`) | |
| 51 | +| `items[].embedding` | number[768] | Yes | — | Pre-computed LaBSE 768-dim embedding | |
| 52 | +| `params` | object | No | RUN 012 defaults | BERTopic hyperparameters | |
| 53 | +| `params.min_topic_size` | int | No | 15 | Minimum documents per topic cluster | |
| 54 | +| `params.nr_topics` | int | No | 20 | Target topic count (merges until reached) | |
| 55 | +| `params.umap_n_neighbors` | int | No | 20 | UMAP local neighborhood size | |
| 56 | +| `params.umap_n_components` | int | No | 10 | UMAP output dimensions | |
| 57 | + |
| 58 | +The worker uses `ConfigDict(extra="ignore")` on all Pydantic models, so additional envelope fields sent by the API (`jobId`, `version`, `type`, `metadata`, `publishedAt`) are silently ignored during validation. |
| 59 | + |
| 60 | +## Response — Success |
| 61 | + |
| 62 | +```json |
| 63 | +{ |
| 64 | + "version": "1.0.0", |
| 65 | + "status": "completed", |
| 66 | + "topics": [ |
| 67 | + { |
| 68 | + "topicIndex": 0, |
| 69 | + "rawLabel": "0_fast_rushed_pace", |
| 70 | + "keywords": ["fast", "rushed", "pace", "speed", "hurry", "quick", "follow", "slow", "behind", "catch"], |
| 71 | + "docCount": 45 |
| 72 | + } |
| 73 | + ], |
| 74 | + "assignments": [ |
| 75 | + { |
| 76 | + "submissionId": "uuid-string", |
| 77 | + "topicIndex": 0, |
| 78 | + "probability": 0.7234 |
| 79 | + } |
| 80 | + ], |
| 81 | + "metrics": { |
| 82 | + "npmi_coherence": 0.1523, |
| 83 | + "topic_diversity": 0.8200, |
| 84 | + "outlier_ratio": 0.1150, |
| 85 | + "silhouette_score": 0.2341, |
| 86 | + "embedding_coherence": 0.6102 |
| 87 | + }, |
| 88 | + "outlierCount": 12, |
| 89 | + "completedAt": "2026-03-21T10:35:00.000Z" |
| 90 | +} |
| 91 | +``` |
| 92 | + |
| 93 | +## Response — Failure |
| 94 | + |
| 95 | +```json |
| 96 | +{ |
| 97 | + "version": "1.0.0", |
| 98 | + "status": "failed", |
| 99 | + "error": "Received 8 items, need at least 15 (min_topic_size) for topic modeling", |
| 100 | + "completedAt": "2026-03-21T10:35:00.000Z" |
| 101 | +} |
| 102 | +``` |
| 103 | + |
| 104 | +### Response Fields |
| 105 | + |
| 106 | +| Field | Type | Present | Description | |
| 107 | +| --- | --- | --- | --- | |
| 108 | +| `version` | string | Always | Worker version (from `config.WORKER_VERSION`) | |
| 109 | +| `status` | `"completed"` \| `"failed"` | Always | Outcome status | |
| 110 | +| `topics` | array | On success | Discovered topic clusters | |
| 111 | +| `topics[].topicIndex` | int | — | BERTopic topic ID (0, 1, 2, ...) | |
| 112 | +| `topics[].rawLabel` | string | — | Auto-generated label (e.g., `"0_fast_rushed_pace"`) | |
| 113 | +| `topics[].keywords` | string[] | — | Top 10 keywords from KeyBERTInspired | |
| 114 | +| `topics[].docCount` | int | — | Documents in this cluster | |
| 115 | +| `assignments` | array | On success | Per-document topic assignments | |
| 116 | +| `assignments[].submissionId` | string | — | Matches input `submissionId` | |
| 117 | +| `assignments[].topicIndex` | int | — | Assigned topic index | |
| 118 | +| `assignments[].probability` | number (0-1) | — | Assignment confidence (4 decimal places) | |
| 119 | +| `metrics` | object | On success | Model quality metrics (see [Metrics](/docs/metrics)) | |
| 120 | +| `outlierCount` | int | On success | Documents assigned to topic -1 | |
| 121 | +| `error` | string | On failure | Human-readable error message | |
| 122 | +| `completedAt` | ISO datetime | Always | Processing completion timestamp | |
| 123 | + |
| 124 | +## API-Side Processing |
| 125 | + |
| 126 | +After receiving the response, the `TopicModelProcessor` in the API: |
| 127 | + |
| 128 | +1. Validates the response against `topicModelWorkerResponseSchema` (Zod) |
| 129 | +2. Creates `Topic` entities for each topic (with `rawLabel`, `keywords`, `docCount`) |
| 130 | +3. Creates `TopicAssignment` entities — filters out assignments with probability ≤ 0.01 |
| 131 | +4. Marks the highest-probability assignment per submission as `isDominant` |
| 132 | +5. Persists metrics on the `TopicModelRun` entity |
| 133 | +6. Calls the orchestrator to advance the pipeline to topic labeling |
| 134 | + |
| 135 | +## Notes |
| 136 | + |
| 137 | +- Outlier documents (topic -1) are **not** included in the `assignments` array |
| 138 | +- The `rawLabel` is later enriched with a human-readable `label` by the topic labeling stage (LLM) |
| 139 | +- Embeddings must be 768-dim LaBSE vectors — the same model used by the embedding worker |
0 commit comments