Sentinel is an open-source election-safety moderation API built to help protect Kenya's 2027 general election from ethnic incitement and election disinformation. It currently supports code-switched moderation across English (en), Swahili (sw), and Sheng (sh), and returns deterministic moderation decisions (ALLOW, REVIEW, BLOCK) with full audit evidence.
The moderation pipeline, appeals system, transparency reporting, and safety controls are fully implemented. However:
- The seed lexicon contains only 7 demonstration terms. Production use requires building a comprehensive lexicon with domain-expert annotation covering the specific hate speech, incitement, and disinformation patterns relevant to your context.
- The multi-label classifier runs in shadow mode (observability only).
- The claim-likeness scorer is active for REVIEW-only disinformation signals (cannot produce
BLOCK). - Deterministic lexicon matches remain the only direct path to
BLOCK.
Production rollout should follow the go-live readiness gate (python scripts/check_go_live_readiness.py) and the SHADOW -> ADVISORY -> SUPERVISED deployment stage progression.
Sentinel currently supports three language codes in the moderation pipeline:
- English (en) — baseline deterministic support
- Swahili (sw) — baseline deterministic support with hint-word detection
- Sheng (sh) — baseline deterministic support with hint-word detection
Language routing is token-informed and returns span-level language_spans, so code-switched text (e.g., English + Sheng in one sentence) is handled natively.
Sentinel also includes language-pack artifacts for additional languages (currently Luo and Kalenjin), but these are for staged/operator rollout and are not active in the default moderation hot path.
| Label | Meaning |
|---|---|
ETHNIC_CONTEMPT |
Ethnic slurs, dehumanizing language targeting ethnic groups |
INCITEMENT_VIOLENCE |
Direct calls to harm, kill, or attack |
HARASSMENT_THREAT |
Targeted personal threats |
DOGWHISTLE_WATCH |
Ambiguous language that may carry coded meaning — flagged for human review |
DISINFO_RISK |
Content resembling known disinformation narratives |
BENIGN_POLITICAL_SPEECH |
Normal political discourse, no policy concern detected |
No. Sentinel is a server-side API. Your backend should call Sentinel and apply the enforcement decision. Never expose the API key to frontend code.
Default to REVIEW for safety-critical content. Never default to ALLOW for content that hasn't been moderated. Set a request timeout of 500-1000ms and implement a circuit breaker pattern. See the Integration Guide for detailed failure handling recommendations.
The action field is your primary enforcement signal:
ALLOW— publish the contentREVIEW— hold for human moderator reviewBLOCK— reject publication
Labels and reason codes provide detail for moderators reviewing flagged content and for users who want to understand why their content was actioned.
No. This is an intentional safety constraint. Only deterministic lexicon matches (exact normalized regex against known terms) can produce a BLOCK action. Vector similarity and claim-likeness paths are REVIEW-only. Multi-label classifier predictions operate in shadow mode and do not change enforcement. See Security for the full safety architecture.
1 to 5,000 characters. Empty strings are rejected (400 error). Text exceeding 5,000 characters is rejected.
Yes. Integrate server-to-server with POST /v1/moderate and map the three actions to your moderation workflow. Sentinel runs without Postgres or Redis in development mode (file-based lexicon, in-memory rate limiting), but production use should include Postgres for the full feature set.
Set the SENTINEL_ELECTORAL_PHASE environment variable and restart the API:
export SENTINEL_ELECTORAL_PHASE=voting_dayThe five phases are pre_campaign, campaign, silence_period, voting_day, and results_period. Each phase adjusts vector match thresholds and no-match behavior. During silence_period, voting_day, and results_period, unmatched content defaults to REVIEW instead of ALLOW. See the Deployment Guide for the full phase table.
The lexicon follows a governed lifecycle: Draft -> Active -> Deprecated. Use the make targets:
make release-create # Create a new draft release
make release-ingest # Add terms to the draft
make release-validate # Validate against quality gates
make release-activate # Activate (deactivates previous active)
make release-deprecate # Deprecate old releasesOnly one release can be active at a time. The active release is what the moderation endpoint uses for matching. See the Deployment Guide for the full lexicon lifecycle.
- SHADOW — Deploy and route traffic. All decisions are downgraded to ALLOW. Log everything, analyze decision quality, tune the lexicon.
- ADVISORY — BLOCK decisions are downgraded to REVIEW. Human moderators see what Sentinel would block. Build confidence in the lexicon.
- SUPERVISED — Full enforcement. BLOCK actions are applied automatically.
Set the stage with SENTINEL_DEPLOYMENT_STAGE=shadow|advisory|supervised.
For development, use a static token registry:
export SENTINEL_OAUTH_TOKENS_JSON='{"my-token": {"client_id": "admin", "scopes": ["admin:appeal:read", "admin:appeal:write"]}}'For production, configure JWT validation:
export SENTINEL_OAUTH_JWT_SECRET='your-secret'See the Deployment Guide for the full OAuth setup including all scopes.
Sentinel exposes three monitoring endpoints:
GET /health— basic health check (no auth)GET /health/live— liveness probe (no auth)GET /health/ready— readiness probe (no auth)GET /metrics— JSON metrics: action counts, HTTP status counts, latency buckets (no auth)GET /metrics/prometheus— Prometheus text format for scraping (no auth)
For internal queue monitoring, use GET /internal/monitoring/queue/metrics with the internal:queue:read OAuth scope.
All requests carry an X-Request-ID header for log correlation.
The go-live gate validates that all required artifacts are present and consistent before production deployment:
python scripts/check_go_live_readiness.py --bundle-dir releases/go-live/<release-id>It checks for a valid lexicon release, policy config, migration state, and launch profile. Use the template bundle in templates/go-live/ as a starting point.