Production: https://sentinel-center.web.app/live/
Sentinel is an AI-powered incident intelligence platform. It turns raw logs and incident narratives into structured analysis—summaries, severity, likely root cause, evidence-grounded remediation, and exportable reports—so teams can respond faster with less manual triage.
Sentinel uses an event-driven, serverless layout on AWS. Operators use the CloudFront-hosted Next.js dashboard; traffic goes through API Gateway to the Lambda API, which persists incidents, jobs, and analysis in Aurora Serverless and enqueues work on SQS. The Planner Lambda runs the pipeline (normalizer → summarizer → investigator → remediator). Amazon Bedrock powers the heavy reasoning steps, and the App Runner Intel service adds supporting analysis. Results are stored and surfaced back in the dashboard for visualization and reporting.
For day-to-day work, the UI and API run on your machine; the database is typically SQLite unless you point the app at Aurora.
graph LR
subgraph Local["Local dev"]
UI[Next.js]
API[FastAPI]
DB[(SQLite / Aurora)]
end
UI --> API
API --> DB
Deeper reference: guides/architecture.md, guides/agent_architecture.md, and intel.md.
Agent roles (see AGENTS.md):
| Module | Role |
|---|---|
| Planner | Orchestrates the incident analysis flow |
| Normalizer | Cleans input, guardrails, evidence snippets |
| Summarizer | Short narrative + severity |
| Investigator | Root-cause analysis (strong model; Nova Pro in AWS guidance) |
| Remediator | Remediation plan and next steps (strong model) |
- Architecture
- Why Sentinel
- What you get
- Repository layout
- Prerequisites
- Quick start
- Configuration
- Running locally
- API overview
- Frontend
- Bulk ZIP upload
- Slack and generic webhooks
- Tests
- AWS deployment
- Documentation
- Squad contributions
- Tech stack
Production incidents rarely arrive as clean stories. Operators paste logs, paste Slack threads, and work under time pressure. Sentinel runs a modular agent pipeline (normalize → summarize → investigate → remediate) with guardrails (prompt-injection handling, evidence extraction, confidence-aware behavior) and surfaces results in a Next.js dashboard backed by a FastAPI service.
- Incident analysis: Automated summary, severity, root-cause hypotheses, and prioritized remediation actions.
- Operational UI: Submit incidents, review jobs, charts, and deep-dive reports (frontend).
- Real-time feedback: Server-Sent Events for pipeline stages and investigation streaming (see API overview).
- Remediation workflow: Track actions, per-action chat for guidance, follow-ups, and clarification Q&A.
- Reporting: JSON/PDF exports, audit PDFs, periodic digests, post-incident review (PIR) helpers.
- Integrations & webhooks: Alertmanager / CloudWatch-style ingestion hooks, optional email reminders (Resend), and outbound notifications (Slack incoming webhooks and generic HTTP webhooks) when analysis completes at high or critical severity (see Slack and generic webhooks).
- Bulk ZIP upload: Upload a
.zipof log-like files from Analyze to create many incidents and jobs in one step, with archive-wide guardrails (see Bulk ZIP upload). - Auth: Clerk for production-style sign-in; local bypass when Clerk is not configured.
| Path | Purpose |
|---|---|
| backend/ | FastAPI app, agents, pipeline, store, reports, scheduler, ingest |
| backend/tests/ | Consolidated backend pytest suite |
| frontend/ | Next.js 14 (Pages Router) dashboard |
| terraform/ | Stage-based AWS IaC (see guides 1–8) |
| scripts/ | Local orchestration (run_local.py), utilities |
| guides/ | Permissions, SageMaker, ingestion, DB, agents, frontend, enterprise |
| intel.md | Deep-dive: files, APIs, frontend, infra |
| gameplan.md | Delivery sequence and guardrail strategy |
- Python ≥ 3.12
- uv for installing and running Python tools
- Node.js and npm (for the frontend)
-
Clone the repository and enter the project root.
-
Environment file
cp .env.example .env
Edit
.envso exactly one ofUSE_OPEN_ROUTERorUSE_BEDROCKistruefor LLM calls (see Configuration). -
Install frontend dependencies (first run only)
cd frontend && npm install && cd ..
-
Start backend + frontend (recommended)
cd scripts && uv run run_local.py
- Backend: http://localhost:8000
- Frontend: http://localhost:3000
If Clerk JWKS / issuer URLs are not set, the orchestrator sets
AUTH_DISABLED=trueso you can develop without signing in.
Copy .env.example to .env at the repo root. Important groups:
| Mode | When to use | Key variables |
|---|---|---|
| OpenRouter | Easiest for local development | USE_OPEN_ROUTER=true, USE_BEDROCK=false, OPENROUTER_API_KEY, optional OPENROUTER_MODEL (default openai/gpt-4o-mini) |
| AWS Bedrock | Production / AWS-aligned setup | USE_BEDROCK=true, USE_OPEN_ROUTER=false, AWS credentials, BEDROCK_MODEL_ID, region |
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY,CLERK_SECRET_KEY,CLERK_JWKS_URLorCLERK_ISSUER(as used by the backend) enable full auth.- Omitting Clerk config triggers auth disabled mode when using
scripts/run_local.py, as described above.
RESEND_API_KEY,RESEND_FROMfor follow-up emails (test sender supported).
S3_BUCKETfor uploads / PDF flows that use S3.DEFAULT_AWS_REGION, account and access keys as needed for Bedrock or S3.
These control when and where completed analysis is pushed after a job finishes (not ingestion of external alerts):
INTEGRATION_NOTIFY_SEVERITIES— Comma-separated list of severities that trigger dispatch (default:high,critical). Example:high,criticalorcriticalonly.SENTINEL_PUBLIC_URLorNEXT_PUBLIC_APP_URL— Public dashboard base URL (no trailing slash). When set, generic webhook payloads include adashboard_urlpointing at the job; when unset,dashboard_urlisnull.
Integrations themselves are stored per user in the database and configured in the UI under Settings (see Slack and generic webhooks).
For narrative setup instructions, see the Local Development section in intel.md.
cd scripts
uv run run_local.pycd backend
uv run uvicorn api.main:app --host 0.0.0.0 --port 8000cd frontend
npm install
npm run devHealth check: GET http://localhost:8000/health
Base URL in local development: http://localhost:8000
| Area | Examples |
|---|---|
| Core | GET /health, GET /api/me, GET /api/team/members |
| Incidents & jobs | POST /api/incidents, POST /api/incidents/analyze-sync, POST /api/incidents/bulk-zip (multipart field archive, optional query title_prefix, source, max_files), GET /api/jobs, GET /api/jobs/{job_id}, POST /api/jobs/{job_id}/run, GET /api/jobs/{job_id}/workflow |
| Streaming | GET /api/jobs/{job_id}/stream, POST /api/stream/investigate |
| Exports | GET /api/jobs/{job_id}/export, GET /api/jobs/{job_id}/audit/pdf |
| Remediation | GET/PATCH .../actions, chat GET/POST .../actions/{action_id}/chat, POST .../actions/{action_id}/evaluate |
| Follow-ups & clarify | follow-ups under /api/jobs/{job_id}/follow-ups, clarifications under /api/jobs/{job_id}/clarify |
| Integrations | GET/POST/DELETE /api/integrations (Slack, generic webhook, Jira, PagerDuty configs); ingestion webhooks under /api/ingest/webhook* |
| Analytics & reports | GET /api/analytics/mttr, POST /api/reports/digest, PIR routes under /api/jobs/{job_id}/pir |
Interactive docs: when the API is running, OpenAPI is available at /docs (Swagger UI) unless disabled in your build.
Next.js Pages Router app (frontend/pages/):
| Route | Purpose |
|---|---|
/ |
Analyze: paste incident text or Upload ZIP (bulk) for many jobs from one archive |
/dashboard |
Jobs, stats, analysis detail |
/audit |
Audit-oriented views |
/settings |
Integrations and preferences |
/sign-in, /sign-up |
Clerk auth |
The UI calls the API at NEXT_PUBLIC_API_URL when set; otherwise it defaults to http://localhost:8000 (see frontend/lib/api.js). With run_local.py, you can put NEXT_PUBLIC_* and Clerk keys in the root .env. If you run npm run dev alone, you can instead use frontend/.env.local (see frontend/README.md).
Use this when you have many small log files (for example per-service .txt or .log exports) and want one incident + job per member file without pasting each by hand.
On / (Analyze), choose Upload ZIP (bulk). The client sends multipart/form-data with the file in field archive to POST /api/incidents/bulk-zip. The page shows Bulk Upload Results (created jobs, skipped members) and lets you open each job in the analysis panel.
| Topic | Detail |
|---|---|
| Endpoint | POST /api/incidents/bulk-zip |
| Multipart | Form field name must be archive (see backend/test_bulk_zip_api.py). |
| Raw body | You may instead POST the raw ZIP bytes with Content-Type: application/zip (or application/octet-stream) for scripts. |
| Query | source (default upload), optional title_prefix (prepended to each incident title), max_files (default 25, max 100). |
| Archive size (entries) | Archives with more than 400 file entries are rejected (400) before reading bodies, to bound CPU/memory on pathological zips. |
| Ingested extensions | .txt, .log, .json, .ndjson, .md, .csv |
| Per-file size | Members larger than 500 KB are skipped (not ingested). |
| Finder metadata | Paths under __MACOSX and AppleDouble ._* files are not ingested as incidents but are still scanned for hidden threats; a bad payload there fails the whole archive. |
| Preflight | If any scanned member fails guardrails (for example prompt-injection-like content in a log line), the API returns 400 with detail.error === "bulk_zip_validation_failed" and a failures list — no incidents are created (all-or-nothing). |
Automated coverage: backend/common/test_bulk_zip_preflight.py, backend/test_bulk_zip_api.py.
Sentinel can notify external systems when a job’s analysis completes, for severities configured by INTEGRATION_NOTIFY_SEVERITIES (see Configuration).
- Open
/settingswhile signed in (or in local auth-disabled mode as your dev user). - Under Add Integration, pick Slack or Generic Webhook and paste the webhook URL.
- Save. Other types (Jira, PagerDuty) may be present in the same list; this section focuses on Slack and HTTP JSON webhooks.
Slack: Use a real Incoming Webhook URL (https://hooks.slack.com/services/... with three path segments). Do not paste documentation placeholders that use a Unicode ellipsis (…) in the path — the API rejects those because Slack responds with redirects, not a successful post.
The server POSTs JSON to your URL. Top-level fields include:
event:sentinel.analysis.completedincident_id,job_idincident_title,incident_source— Title is typically the incident label (for bulk ZIP, oftentitle_prefix+ member file name); source is usuallyuploadormanual.severity,summary,severity_reason, root cause fields,recommended_actions,next_checks,risk_if_unresolveddashboard_url— Absolute link whenSENTINEL_PUBLIC_URL/NEXT_PUBLIC_APP_URLis set; otherwisenull.
Manual smoke test (from backend/): WEBHOOK_URL=https://… uv run python -m integrations.manual_dispatch (optional INTEGRATION_TYPE=slack, INCIDENT_TITLE, INCIDENT_SOURCE).
Dispatch runs from the same run_job path used locally and on Lambda. Packaging scripts include common/ and integrations/ but skip test_*.py / *_test.py so pytest modules are not shipped in deployment zips. After changing integration code, redeploy the Lambdas that run the pipeline (see AWS deployment).
Backend tests use pytest. From the repo root:
uv run --project backend pytest backend/testsTest directory:
backend/tests/
Infrastructure is organized as independent Terraform stages under terraform/, aligned with guides/1_permissions.md through guides/8_enterprise.md. Use each stage’s terraform.tfvars.example (where present) as a template.
Suggested order is documented in gameplan.md: permissions → SageMaker → ingestion → intel → database → agents → frontend → enterprise monitoring.
| Document | Contents |
|---|---|
| intel.md | End-to-end intelligence: env, agents, API, frontend, Terraform |
| gameplan.md | MVP outcomes, guide order, guardrails, delivery focus |
| SENTINEL_HANDOVER.md | Scaffold / handover status |
| guides/agent_architecture.md | Agent sequence and data flow |
| AGENTS.md | Tooling and model conventions for this repo |
| Name | Role |
|---|---|
| Eben and Michael | Backend and frontend: APIs, agent pipeline, and dashboard UI |
| Joshua | Deployment: AWS, Terraform stages, and end-to-end infrastructure delivery |
| Tunde | Demo presentation |
| Ayesha | Base codebase setup and repository documentation; bulk ZIP upload (API preflight guardrails, Analyze UI, job table); Slack and generic webhook outbound integrations (payload fields, severity-based dispatch, dashboard links); test engineering including bulk ZIP and dispatcher coverage |
| Oluwagbamila | End-to-end application: full product flow from incident intake through analysis to dashboard delivery |
| Category | Stack |
|---|---|
| Language | Python 3.12+ |
| Backend | FastAPI, Uvicorn, Pydantic v2, httpx, PyJWT, fpdf2, python-dotenv, Mangum (Lambda adapter) |
| Frontend | Next.js 14, React 18, Pages Router, Clerk, Recharts |
| LLM | OpenRouter or Amazon Bedrock (via boto3) |
| Data | SQLite (local dev); Aurora Serverless v2 + RDS Data API (AWS deployment) |
| Cloud | API Gateway, Lambda, SQS, CloudFront, S3, App Runner (Intel), EventBridge, CloudWatch |
| IaC | Terraform — staged modules in terraform/ |
| Auth | Clerk (JWT); optional AUTH_DISABLED for local development |
| Tooling | uv, npm, pytest |
Application packages in this repo are currently versioned at 0.3.0 (see backend/pyproject.toml and frontend/package.json).
