Skip to content

Commit 46c996a

Browse files
authored
feat: add keepalive and TTL support to service registration (#5)
1 parent 91e397c commit 46c996a

12 files changed

Lines changed: 760 additions & 73 deletions

File tree

.codecov.yml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
coverage:
2+
status:
3+
project:
4+
default:
5+
target: auto
6+
threshold: 1%
7+
patch:
8+
default:
9+
target: 70%
10+
threshold: 5%
11+
12+
comment:
13+
layout: "condensed_header, condensed_files, condensed_footer"
14+
behavior: default
15+
16+
ignore:
17+
- "examples/**"
18+
- "tests/**"
19+
- "docs/**"

docs/guides/registration.md

Lines changed: 73 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,9 @@ The `.with_registration()` method accepts these parameters:
9191
retry_delay=2.0, # Seconds between retries
9292
fail_on_error=False, # Abort startup on failure
9393
timeout=10.0, # HTTP request timeout
94+
enable_keepalive=True, # Enable periodic ping to keep service alive
95+
keepalive_interval=10.0, # Seconds between keepalive pings
96+
auto_deregister=True, # Automatically deregister on shutdown
9497
)
9598
```
9699

@@ -106,6 +109,9 @@ The `.with_registration()` method accepts these parameters:
106109
- **retry_delay** (`float`): Delay in seconds between retry attempts. Default: 2.0.
107110
- **fail_on_error** (`bool`): If True, raise exception and abort startup on registration failure. If False, log warning and continue. Default: False.
108111
- **timeout** (`float`): HTTP request timeout in seconds. Default: 10.0.
112+
- **enable_keepalive** (`bool`): Enable periodic pings to keep service registered. Default: True.
113+
- **keepalive_interval** (`float`): Seconds between keepalive pings. Default: 10.0.
114+
- **auto_deregister** (`bool`): Automatically deregister service on shutdown. Default: True.
109115

110116
---
111117

@@ -120,7 +126,51 @@ The `.with_registration()` method accepts these parameters:
120126
5. **Payload Creation**: Serializes ServiceInfo to JSON (supports custom subclasses)
121127
6. **Registration Request**: Sends POST to orchestrator endpoint
122128
7. **Retry on Failure**: Retries with delay if request fails
123-
8. **Logging**: Logs all attempts and final outcome
129+
8. **Keepalive Started**: If enabled, background task starts pinging orchestrator
130+
9. **Service Runs**: Service handles requests while staying alive via pings
131+
10. **Shutdown**: On graceful shutdown, stops keepalive and optionally deregisters
132+
11. **Logging**: Logs all registration, ping, and deregistration events
133+
134+
### Keepalive and TTL
135+
136+
Services can be configured to send periodic "ping" requests to the orchestrator to indicate they're still alive. The orchestrator tracks a Time-To-Live (TTL) for each service and automatically removes services that haven't pinged within the TTL window.
137+
138+
**How it works:**
139+
140+
1. **Initial Registration**: Service registers and receives response with:
141+
- `id`: Unique ULID identifier for this service
142+
- `ttl_seconds`: How long until service expires (default: 30 seconds)
143+
- `ping_url`: Endpoint to send keepalive pings (automatically provided by orchestrator)
144+
145+
2. **Keepalive Loop**: Background task automatically sends PUT requests to `ping_url` every N seconds:
146+
- Default interval: 10 seconds (configurable via `keepalive_interval`)
147+
- Each ping resets the service's expiration time
148+
- Failures are logged but don't crash the service
149+
150+
3. **TTL Expiration**: Orchestrator runs cleanup every 5 seconds:
151+
- Removes services that haven't pinged within TTL window
152+
- Logs expired services for monitoring
153+
154+
4. **Graceful Shutdown**: On service shutdown:
155+
- Keepalive task stops (no more pings)
156+
- Service explicitly deregisters (if `auto_deregister=True`)
157+
- Immediate removal from registry
158+
159+
**Configuration examples:**
160+
161+
```python
162+
# Default: keepalive enabled, auto-deregister on shutdown
163+
.with_registration()
164+
165+
# Disable keepalive (rely on manual health checks)
166+
.with_registration(enable_keepalive=False)
167+
168+
# Custom keepalive interval (faster pings)
169+
.with_registration(keepalive_interval=5.0)
170+
171+
# Don't deregister on shutdown (let TTL expire naturally)
172+
.with_registration(auto_deregister=False)
173+
```
124174

125175
### Registration Payload
126176

@@ -154,6 +204,28 @@ For custom ServiceInfo subclasses:
154204
}
155205
```
156206

207+
### Registration Response
208+
209+
The orchestrator responds with registration details, including the ping endpoint:
210+
211+
```json
212+
{
213+
"id": "01K83B5V85PQZ1HTH4DQ7NC9JM",
214+
"status": "registered",
215+
"service_url": "http://my-service:8000",
216+
"message": "Service registered successfully",
217+
"ttl_seconds": 30,
218+
"ping_url": "http://orchestrator:9000/services/01K83B5V85PQZ1HTH4DQ7NC9JM/$ping"
219+
}
220+
```
221+
222+
**Key fields:**
223+
- `id`: Unique ULID identifier assigned by orchestrator
224+
- `ttl_seconds`: Time-to-live in seconds (service must ping within this window)
225+
- `ping_url`: Endpoint for keepalive pings (automatically used by the service)
226+
227+
**Important**: The `ping_url` is provided by the orchestrator - services don't need to configure it. The service automatically uses this URL for keepalive pings when `enable_keepalive=True`.
228+
157229
### Hostname Resolution
158230

159231
Priority order:

examples/registration/Dockerfile

Lines changed: 16 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,28 @@
11
# Registration Demo Dockerfile
22
FROM ghcr.io/astral-sh/uv:0.9-python3.13-bookworm-slim AS builder
33

4-
WORKDIR /app
5-
64
ARG USER=servicekit UID=10001
75
RUN useradd -u ${UID} -m -s /bin/bash ${USER}
86

97
# UV configuration for better build performance
108
ENV UV_COMPILE_BYTECODE=1
119
ENV UV_LINK_MODE=copy
1210

13-
# Copy and build parent servicekit project
14-
COPY pyproject.toml uv.lock README.md /servicekit/
15-
COPY src /servicekit/src/
16-
WORKDIR /servicekit
17-
RUN uv build
11+
# Copy parent servicekit project (needed as path dependency)
12+
COPY pyproject.toml uv.lock README.md /app/
13+
COPY src /app/src/
1814

19-
# Install servicekit wheel in app directory
20-
WORKDIR /app
21-
RUN --mount=type=cache,target=/root/.cache/uv \
22-
uv venv && \
23-
uv pip install /servicekit/dist/*.whl
15+
# Copy registration example
16+
COPY examples/registration /app/examples/registration
2417

25-
# Copy demo files
26-
COPY examples/registration/main.py ./
27-
COPY examples/registration/main_custom.py ./
28-
COPY examples/registration/orchestrator.py ./
18+
# Install dependencies from registration example directly in /app
19+
WORKDIR /app/examples/registration
20+
RUN --mount=type=cache,target=/root/.cache/uv \
21+
uv sync --frozen
2922

3023
# Cleanup Python cache files
31-
RUN find /app/.venv -type d -name '__pycache__' -prune -exec rm -rf {} + && \
32-
find /app/.venv -type f -name '*.py[co]' -delete || true
24+
RUN find .venv -type d -name '__pycache__' -prune -exec rm -rf {} + && \
25+
find .venv -type f -name '*.py[co]' -delete || true
3326

3427
# ---- runtime ----
3528
FROM python:3.13-slim AS runtime
@@ -51,14 +44,12 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
5144
apt-get install -y --no-install-recommends ca-certificates tini && \
5245
apt-get clean && rm -rf /var/lib/apt/lists/*
5346

54-
# Copy venv and application from builder
55-
COPY --from=builder --chown=${USER}:${USER} /app/.venv /app/.venv
56-
COPY --from=builder --chown=${USER}:${USER} /app/main.py /app/main.py
57-
COPY --from=builder --chown=${USER}:${USER} /app/main_custom.py /app/main_custom.py
58-
COPY --from=builder --chown=${USER}:${USER} /app/orchestrator.py /app/orchestrator.py
47+
# Copy entire app directory including venv and source from builder
48+
COPY --from=builder --chown=${USER}:${USER} /app /app
5949

60-
ENV VIRTUAL_ENV=/app/.venv
61-
ENV PATH=/app/.venv/bin:${PATH}
50+
ENV VIRTUAL_ENV=/app/examples/registration/.venv
51+
ENV PATH=/app/examples/registration/.venv/bin:${PATH}
52+
WORKDIR /app/examples/registration
6253
ENV PYTHONDONTWRITEBYTECODE=1
6354
ENV PYTHONUNBUFFERED=1
6455
ENV PYTHONFAULTHANDLER=1

examples/registration/README.md

Lines changed: 77 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,24 @@ Demonstrates automatic service registration with an orchestrator for service dis
1010
- **Custom Metadata**: Support for ServiceInfo subclasses with additional fields
1111
- **Mock Orchestrator**: Simple orchestrator for testing and development
1212
- **Multi-Service Setup**: Example with multiple services (svca, svcb)
13+
- **Keepalive & TTL**: Services send periodic pings to stay registered (30s TTL, 10s interval)
14+
- **Auto-Deregistration**: Services gracefully deregister on shutdown
15+
- **Valkey-based Storage**: TTL and expiration handled by Valkey (no manual cleanup needed)
1316

1417
## Quick Start
1518

1619
### Local Development
1720

21+
**Prerequisites**: Valkey or Redis running on localhost:6379
22+
1823
#### Run Orchestrator
1924

2025
```bash
26+
# Start Valkey (using Docker)
27+
docker run -d -p 6379:6379 valkey/valkey:8
28+
2129
# Install dependencies
30+
cd examples/registration
2231
uv sync
2332

2433
# Run the mock orchestrator
@@ -63,10 +72,10 @@ docker compose down
6372
## Architecture
6473

6574
```
66-
┌─────────────┐
67-
│ Orchestrator│ ← Registration endpoint at :9000/services/$register
68-
│ (port 9000)│
69-
└──────▲──────┘
75+
┌─────────────┐ ┌────────┐
76+
│ Orchestrator│────→│ Valkey │ TTL-based service expiration
77+
│ (port 9000)│ │ :6379 │
78+
└──────▲──────┘ └────────┘
7079
7180
│ HTTP POST on startup
7281
@@ -90,7 +99,8 @@ docker compose down
9099

91100
### Orchestrator Endpoints
92101

93-
- `POST /services/$register` - Register a service (called by services on startup)
102+
- `POST /services/$register` - Register a service (returns service_id, ttl_seconds, ping_url)
103+
- `PUT /services/{id}/$ping` - Send keepalive ping to extend TTL
94104
- `GET /services` - List all registered services
95105
- `GET /services/{id}` - Get specific service details by ULID
96106
- `DELETE /services/{id}` - Deregister a service by ULID
@@ -119,11 +129,70 @@ docker compose down
119129
"id": "01K83B5V85PQZ1HTH4DQ7NC9JM",
120130
"status": "registered",
121131
"service_url": "http://svca:8000",
122-
"message": "..."
132+
"message": "...",
133+
"ttl_seconds": 30,
134+
"ping_url": "http://orchestrator:9000/services/01K83B5V85PQZ1HTH4DQ7NC9JM/$ping"
123135
}
124136
```
125-
7. **Retry on Failure**: Retries up to 5 times with 2-second delay
126-
8. **Success/Failure Logging**: Logs outcome with service ID via structured logging
137+
7. **Keepalive Started**: Background task starts sending pings every 10 seconds to `ping_url`
138+
8. **Service Runs**: Service handles requests while keepalive maintains registration
139+
9. **Retry on Failure**: Initial registration retries up to 5 times with 2-second delay
140+
10. **Graceful Shutdown**: On shutdown, service stops keepalive and deregisters explicitly
141+
11. **Success/Failure Logging**: Logs all registration, ping, and deregistration events
142+
143+
## Keepalive and TTL
144+
145+
### How It Works
146+
147+
The orchestrator uses Valkey's built-in TTL mechanism for automatic service expiration:
148+
149+
- **TTL**: 30 seconds (configurable in `orchestrator.py`)
150+
- **Ping Interval**: 10 seconds (services send keepalive every 10s)
151+
- **Expiration**: Handled automatically by Valkey (no manual cleanup task needed)
152+
153+
**Timeline Example:**
154+
- `T+0s`: Service registers, Valkey stores with `EX 30` (expires at T+30s)
155+
- `T+10s`: Service pings, Valkey resets TTL to 30s (expires at T+40s)
156+
- `T+20s`: Service pings, Valkey resets TTL to 30s (expires at T+50s)
157+
- `T+30s`: Service pings, Valkey resets TTL to 30s (expires at T+60s)
158+
- If service crashes at `T+35s` and stops pinging:
159+
- `T+65s`: Valkey automatically removes the key (30s after last ping)
160+
- Service no longer appears in registry
161+
162+
### Ping Endpoint
163+
164+
**Request:**
165+
```bash
166+
PUT /services/{service_id}/$ping
167+
```
168+
169+
**Response:**
170+
```json
171+
{
172+
"id": "01K83B5V85PQZ1HTH4DQ7NC9JM",
173+
"status": "alive",
174+
"last_ping_at": "2025-10-27T12:00:30.000Z",
175+
"expires_at": "2025-10-27T12:01:00.000Z"
176+
}
177+
```
178+
179+
### Configuration Options
180+
181+
Services can configure keepalive behavior:
182+
183+
```python
184+
# Default: keepalive enabled, 10s interval, auto-deregister on shutdown
185+
.with_registration()
186+
187+
# Disable keepalive (service expires after 30s if not manually pinged)
188+
.with_registration(enable_keepalive=False)
189+
190+
# Custom ping interval (faster keepalive)
191+
.with_registration(keepalive_interval=5.0)
192+
193+
# Don't deregister on shutdown (let TTL expire naturally)
194+
.with_registration(auto_deregister=False)
195+
```
127196

128197
## Configuration
129198

examples/registration/compose.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,16 @@
11
services:
2+
valkey:
3+
image: valkey/valkey:8
4+
ports:
5+
- "6379:6379"
6+
restart: unless-stopped
7+
healthcheck:
8+
test: ["CMD", "valkey-cli", "ping"]
9+
interval: 5s
10+
timeout: 3s
11+
retries: 5
12+
start_period: 10s
13+
214
orchestrator:
315
build:
416
context: ../..
@@ -10,6 +22,10 @@ services:
1022
LOG_FORMAT: json
1123
LOG_LEVEL: INFO
1224
WORKERS: 1
25+
VALKEY_URL: redis://valkey:6379
26+
depends_on:
27+
valkey:
28+
condition: service_healthy
1329
restart: unless-stopped
1430
healthcheck:
1531
test:

0 commit comments

Comments
 (0)