Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
860 changes: 22 additions & 838 deletions README.md

Large diffs are not rendered by default.

253 changes: 253 additions & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
# Contributing to vCon Server

This guide covers everything you need to set up a local development environment, run the test suite, understand the project layout, and submit changes.

---

## Prerequisites

| Tool | Minimum Version | Notes |
|---|---|---|
| Python | 3.12 | Required for local development outside Docker |
| Poetry | latest stable | Dependency and virtualenv management |
| Docker | latest stable | Required for running the full stack |
| Docker Compose | v2 (bundled with Docker Desktop) | Used for all service orchestration |

Verify you have these installed:

```bash
python --version # 3.12+
poetry --version
docker --version
docker compose version
```

---

## Cloning and Initial Setup

```bash
# 1. Clone the repository
git clone https://github.com/vcon-dev/vcon-server.git
cd vcon-server

# 2. Copy and configure the environment file
cp .env.example .env
# Edit .env — at minimum set REDIS_URL and CONSERVER_API_TOKEN

# 3. Copy and customise the Docker Compose file
cp example_docker-compose.yml docker-compose.yml

# 4. Create the shared Docker network
docker network create conserver

# 5. Build the containers
docker compose build
```

To start all services:

```bash
docker compose up -d
```

---

## Running the Test Suite

Tests are run inside the Docker environment so that all service dependencies (Redis, etc.) are available.

```bash
# Run the full test suite
docker compose run --rm api poetry run pytest server/links/analyze/tests/ server/storage/milvus/test_milvus.py -v
```

To suppress OpenTelemetry export errors when Datadog is not configured, unset the OTLP endpoint:

```bash
docker compose run --rm -e OTEL_EXPORTER_OTLP_ENDPOINT= api poetry run pytest server/links/analyze/tests/ -v
```

Run a specific test file:

```bash
docker compose run --rm api poetry run pytest server/links/my_link/tests/test_my_link.py -v
```

---

## Project Structure

```
vcon-server/
├── server/ # All application source code
│ ├── main.py # vCon processing pipeline — chain management,
│ │ # worker processes, Redis queue consumption
│ ├── api.py # FastAPI REST API — CRUD for vCons, chain
│ │ # ingress/egress, config and DLQ endpoints
│ ├── vcon.py # vCon data model — Vcon class with all
│ │ # builder methods and property accessors
│ ├── config.py # Config loading — reads CONSERVER_CONFIG_FILE
│ │ # (YAML) and exposes worker/storage settings
│ ├── settings.py # Environment variables — all os.getenv() calls
│ │ # with defaults live here
│ ├── redis_mgr.py # Redis connection management — single shared
│ │ # connection pool used by API and pipeline
│ ├── follower.py # Ingress follower — BLPOP loop that drives
│ │ # vCons from queue into a chain
│ ├── hook.py # Pre/post-link hook system
│ ├── dlq_utils.py # Dead-letter queue helpers
│ ├── version.py # Version string utilities
│ │
│ ├── links/ # Processing link modules (one subdirectory each)
│ │ ├── analyze/ # LLM-based analysis (summary, labels, etc.)
│ │ ├── deepgram_link/ # Deepgram speech-to-text transcription
│ │ ├── tag/ # Applies configurable tags to vCons
│ │ ├── webhook/ # HTTP webhook delivery
│ │ └── ... # Additional built-in links
│ │
│ ├── storage/ # Storage adapter modules (one subdirectory each)
│ │ ├── base.py # Abstract Storage base class
│ │ ├── file/ # Local filesystem storage
│ │ ├── mongo/ # MongoDB storage
│ │ ├── postgres/ # PostgreSQL storage
│ │ ├── s3/ # AWS S3 (and S3-compatible) storage
│ │ ├── milvus/ # Milvus vector database storage
│ │ └── ... # Additional storage adapters
│ │
│ ├── tracers/ # Audit tracer modules
│ │ └── jlinc/ # JLINC zero-knowledge auditing tracer
│ │
│ └── lib/ # Shared utilities
│ ├── logging_utils.py # init_logger() — standard logger factory
│ ├── vcon_redis.py # VconRedis — high-level Redis get/store for vCons
│ ├── context_utils.py # OpenTelemetry trace context propagation
│ ├── metrics.py # Counter and histogram helpers
│ ├── error_tracking.py # Error tracker initialisation
│ └── ...
├── docs/ # MkDocs documentation source
├── example_config.yml # Annotated reference configuration
├── example_docker-compose.yml # Starting point for docker-compose.yml
├── pyproject.toml # Poetry project and dependency manifest
└── .env.example # Template for the required environment file
```

---

## Environment Variables

All environment variables are declared in `server/settings.py`. Key variables:

| Variable | Default | Description |
|---|---|---|
| `REDIS_URL` | `redis://localhost` | Redis connection string |
| `CONSERVER_API_TOKEN` | _(none)_ | Bearer token for the REST API |
| `CONSERVER_CONFIG_FILE` | `./example_config.yml` | Path to the YAML config file |
| `CONSERVER_WORKERS` | `1` | Number of parallel worker processes |
| `CONSERVER_PARALLEL_STORAGE` | `true` | Write to storage backends concurrently |
| `CONSERVER_START_METHOD` | _(platform default)_ | Multiprocessing start method: `fork`, `spawn`, or `forkserver` |
| `LOG_LEVEL` | `DEBUG` | Python logging level |
| `ENV` | `dev` | Runtime environment label |
| `VCON_REDIS_EXPIRY` | `3600` | TTL (seconds) for vCons cached back in Redis |
| `VCON_INDEX_EXPIRY` | `86400` | TTL (seconds) for the vCon sorted-set index |
| `VCON_DLQ_EXPIRY` | `604800` | TTL (seconds) for dead-letter queue entries |
| `UUID8_DOMAIN_NAME` | `strolid.com` | DNS domain used when generating UUID v8 identifiers |
| `OPENAI_API_KEY` | _(none)_ | OpenAI key (used by analysis links) |
| `DEEPGRAM_KEY` | _(none)_ | Deepgram key (used by transcription links) |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | _(none)_ | OTLP endpoint for OpenTelemetry traces/metrics |

---

## Coding Conventions

### Logger initialisation

Every module that needs logging calls `init_logger` from `lib/logging_utils.py` rather than using `logging.getLogger` directly:

```python
from lib.logging_utils import init_logger
logger = init_logger(__name__)
```

### The VconRedis pattern

Links and other components that need to read or write vCons use `VconRedis` from `lib/vcon_redis.py`. This provides a consistent interface and ensures the correct Redis key format (`vcon:<uuid>`):

```python
from lib.vcon_redis import VconRedis

vcon_redis = VconRedis()
vcon = vcon_redis.get_vcon(vcon_uuid) # returns a Vcon instance or None
# ... modify vcon ...
vcon_redis.store_vcon(vcon) # serialises and saves back to Redis
```

### default_options dict

Every link module exposes a module-level `default_options` dictionary that declares all supported configuration keys and their defaults. The `run` function merges caller-supplied options on top of the defaults:

```python
default_options = {
"threshold": 0.5,
"model": "gpt-4",
}

def run(vcon_uuid, link_name, opts=default_options):
options = {**default_options, **opts}
...
```

This convention makes options self-documenting and ensures backward compatibility when new options are added.

### Error handling

- Raise an exception (letting the pipeline move the vCon to the DLQ) for permanent or unknown failures.
- Return `None` to silently filter a vCon out of the chain without moving it to the DLQ.
- Return `vcon_uuid` to continue processing normally.

---

## Creating a New Processing Link

See [docs/extending/creating-links.md](extending/creating-links.md) for the complete guide, including the required `run()` signature, directory layout, configuration registration, testing patterns, and best practices.

Quick reference — minimum viable link:

```
server/links/my_link/
__init__.py # must contain run() and default_options
tests/
__init__.py
test_my_link.py
```

---

## Creating a New Storage Adapter

See [docs/extending/creating-storage-adapters.md](extending/creating-storage-adapters.md) for the full guide, including how to subclass `storage.base.Storage`, the required method signatures, and how to register the adapter in `config.yml`.

---

## Branch and PR Workflow

1. **Branch from `main`**:

```bash
git checkout main
git pull origin main
git checkout -b your-feature-branch
```

2. **Make your changes**, following the coding conventions above. Add or update tests.

3. **Run the tests** inside Docker to confirm nothing is broken:

```bash
docker compose run --rm api poetry run pytest server/links/analyze/tests/ -v
```

4. **Open a Pull Request against `main`** on GitHub at [https://github.com/vcon-dev/vcon-server](https://github.com/vcon-dev/vcon-server). Provide a clear description of what the PR does and why, and link any related issues.

5. Maintainers will review the PR. Address any requested changes by pushing additional commits to the same branch — do not force-push.
79 changes: 79 additions & 0 deletions docs/reference/links/analyze-and-label.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# analyze_and_label

Analyzes vCon dialog content with an OpenAI model and applies the returned labels as tags on the vCon. The model is prompted to return a JSON object with a `labels` array; each label is then added as both a structured analysis entry and a vCon tag.

## Prerequisites

- `OPENAI_API_KEY` environment variable (or provided in options)

## Configuration

```yaml
links:
analyze_and_label:
module: links.analyze_and_label
options:
model: gpt-4-turbo
prompt: "Analyze this transcript and provide a list of relevant labels..."
```

## Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `prompt` | string | `"Analyze this transcript and provide a list of relevant labels for categorization. Return your response as a JSON object with a single key 'labels' containing an array of strings."` | Prompt sent to the model; must instruct the model to return `{"labels": [...]}` |
| `analysis_type` | string | `labeled_analysis` | Type name stored on the resulting analysis entry |
| `model` | string | `gpt-4-turbo` | OpenAI model to use |
| `sampling_rate` | float | `1` | Fraction of vCons to process (1 = 100 %, 0.5 = 50 %) |
| `temperature` | float | `0.2` | Model temperature (0–1) |
| `source.analysis_type` | string | `transcript` | Type of analysis to read as input text |
| `source.text_location` | string | `body.paragraphs.transcript` | Dot-separated path to the text field within the source analysis |
| `response_format` | dict | `{"type": "json_object"}` | OpenAI response format parameter |
| `OPENAI_API_KEY` | string | — | OpenAI API key (overrides environment variable) |

## Example

```yaml
chains:
label_calls:
links:
- analyze_and_label:
model: gpt-4-turbo
prompt: |
Identify key topics, sentiments, and issues in this conversation.
Return your response as a JSON object with a single key 'labels'
containing an array of strings.
sampling_rate: 1
temperature: 0.2
storages:
- postgres
ingress_lists:
- transcribed
enabled: 1
```

## Output

Adds a `labeled_analysis` entry to the vCon and applies each returned label as a tag:

```json
{
"analysis": [
{
"type": "labeled_analysis",
"vendor": "openai",
"dialog": 0,
"body": "{\"labels\": [\"billing\", \"refund\", \"escalation\"]}"
}
]
}
```

The vCon will also have tags `billing`, `refund`, and `escalation` applied.

## Behavior

- Skips dialogs that have no source analysis of the configured type
- Skips dialogs that already have a `labeled_analysis` (or the configured `analysis_type`)
- Respects `sampling_rate` to process only a fraction of vCons
- Retries API calls with exponential backoff (up to 6 attempts)
Loading
Loading