Skip to content

Latest commit

 

History

History
109 lines (84 loc) · 2.69 KB

File metadata and controls

109 lines (84 loc) · 2.69 KB

Embedding Worker — Faculytics

LaBSE embedding worker for the Faculytics analysis pipeline. Receives text via HTTP, returns 768-dimensional L2-normalized embeddings using sentence-transformers with ONNX backend for CPU-optimized inference.

API Contract

POST /embeddings

Request:

{
  "jobId": "uuid",
  "version": "1.0",
  "type": "embedding",
  "text": "The professor explains concepts clearly.",
  "metadata": {
    "submissionId": "uuid",
    "facultyId": "faculty-001",
    "versionId": "version-001"
  },
  "publishedAt": "2026-03-14T00:00:00.000Z"
}

Success response (HTTP 200):

{
  "jobId": "uuid",
  "version": "1.0",
  "status": "completed",
  "result": {
    "embedding": [0.01, 0.02, "... (768 floats)"],
    "modelName": "LaBSE"
  },
  "completedAt": "2026-03-14T00:01:00.000Z"
}

Error response (HTTP 200 — domain errors avoid BullMQ retries):

{
  "jobId": "uuid",
  "version": "1.0",
  "status": "failed",
  "error": "description",
  "completedAt": "2026-03-14T00:01:00.000Z"
}

GET /health

Returns 200 {"status": "ok", "model": "LaBSE"} when ready, 503 otherwise.

Quick Start

Local development

# Install dependencies
uv sync

# Run dev server
uv run uvicorn src.main:app --reload

# Run tests
uv run pytest

# Lint & format
uv run ruff check src/ tests/
uv run ruff format src/ tests/

Docker

docker build -t embedding-worker .
docker run -p 8000:8000 embedding-worker

Configuration

Variable Default Description
HOST 0.0.0.0 Server bind address
PORT 8000 Server port
MODEL_NAME sentence-transformers/LaBSE Hugging Face model ID
MODEL_BACKEND onnx Inference backend (onnx or torch)
LOG_LEVEL INFO Python log level
OPENAPI_MODE false Enable Swagger UI at /docs

Copy .env.sample to .env to get started.

Architecture

src/
├── config.py       # pydantic-settings configuration
├── models.py       # Pydantic request/response schemas (camelCase aliases)
├── embedding.py    # EmbeddingService: model loading and inference
└── main.py         # FastAPI app, lifespan, routes
  • Model loading happens once at startup via FastAPI's lifespan context manager
  • ONNX backend provides 2-4x faster CPU inference compared to PyTorch
  • Domain errors return HTTP 200 with status: "failed" to prevent BullMQ from retrying bad input — only unexpected server failures return 5xx
  • Contract compliance — Pydantic schemas use camelCase field aliases matching the Zod schemas in the NestJS API