Sentinel

Production: https://sentinel-center.web.app/live/

Sentinel is an AI-powered incident intelligence platform. It turns raw logs and incident narratives into structured analysis—summaries, severity, likely root cause, evidence-grounded remediation, and exportable reports—so teams can respond faster with less manual triage.

Architecture

Architecture overview

Sentinel uses an event-driven, serverless layout on AWS. Operators use the CloudFront-hosted Next.js dashboard; traffic goes through API Gateway to the Lambda API, which persists incidents, jobs, and analysis in Aurora Serverless and enqueues work on SQS. The Planner Lambda runs the pipeline (normalizer → summarizer → investigator → remediator). Amazon Bedrock powers the heavy reasoning steps, and the App Runner Intel service adds supporting analysis. Results are stored and surfaced back in the dashboard for visualization and reporting.

Diagram

Local development

For day-to-day work, the UI and API run on your machine; the database is typically SQLite unless you point the app at Aurora.

graph LR
  subgraph Local["Local dev"]
    UI[Next.js]
    API[FastAPI]
    DB[(SQLite / Aurora)]
  end
  UI --> API
  API --> DB

Deeper reference: guides/architecture.md, guides/agent_architecture.md, and intel.md.

Agent roles (see AGENTS.md):

Module	Role
Planner	Orchestrates the incident analysis flow
Normalizer	Cleans input, guardrails, evidence snippets
Summarizer	Short narrative + severity
Investigator	Root-cause analysis (strong model; Nova Pro in AWS guidance)
Remediator	Remediation plan and next steps (strong model)

Why Sentinel

Production incidents rarely arrive as clean stories. Operators paste logs, paste Slack threads, and work under time pressure. Sentinel runs a modular agent pipeline (normalize → summarize → investigate → remediate) with guardrails (prompt-injection handling, evidence extraction, confidence-aware behavior) and surfaces results in a Next.js dashboard backed by a FastAPI service.

What you get

Incident analysis: Automated summary, severity, root-cause hypotheses, and prioritized remediation actions.
Operational UI: Submit incidents, review jobs, charts, and deep-dive reports (frontend).
Real-time feedback: Server-Sent Events for pipeline stages and investigation streaming (see API overview).
Remediation workflow: Track actions, per-action chat for guidance, follow-ups, and clarification Q&A.
Reporting: JSON/PDF exports, audit PDFs, periodic digests, post-incident review (PIR) helpers.
Integrations & webhooks: Alertmanager / CloudWatch-style ingestion hooks, optional email reminders (Resend), and outbound notifications (Slack incoming webhooks and generic HTTP webhooks) when analysis completes at high or critical severity (see Slack and generic webhooks).
Bulk ZIP upload: Upload a .zip of log-like files from Analyze to create many incidents and jobs in one step, with archive-wide guardrails (see Bulk ZIP upload).
Auth: Clerk for production-style sign-in; local bypass when Clerk is not configured.

Repository layout

Path	Purpose
backend/	FastAPI app, agents, pipeline, store, reports, scheduler, ingest
backend/tests/	Consolidated backend pytest suite
frontend/	Next.js 14 (Pages Router) dashboard
terraform/	Stage-based AWS IaC (see guides 1–8)
scripts/	Local orchestration (`run_local.py`), utilities
guides/	Permissions, SageMaker, ingestion, DB, agents, frontend, enterprise
intel.md	Deep-dive: files, APIs, frontend, infra
gameplan.md	Delivery sequence and guardrail strategy

Prerequisites

Python ≥ 3.12
uv for installing and running Python tools
Node.js and npm (for the frontend)

Quick start

Clone the repository and enter the project root.
Environment file
```
cp .env.example .env
```
Edit .env so exactly one of USE_OPEN_ROUTER or USE_BEDROCK is true for LLM calls (see Configuration).
Install frontend dependencies (first run only)
```
cd frontend && npm install && cd ..
```
Start backend + frontend (recommended)
```
cd scripts && uv run run_local.py
```
- Backend: http://localhost:8000
- Frontend: http://localhost:3000
If Clerk JWKS / issuer URLs are not set, the orchestrator sets AUTH_DISABLED=true so you can develop without signing in.

Configuration

Copy .env.example to .env at the repo root. Important groups:

LLM provider (pick one)

Mode	When to use	Key variables
OpenRouter	Easiest for local development	`USE_OPEN_ROUTER=true`, `USE_BEDROCK=false`, `OPENROUTER_API_KEY`, optional `OPENROUTER_MODEL` (default `openai/gpt-4o-mini`)
AWS Bedrock	Production / AWS-aligned setup	`USE_BEDROCK=true`, `USE_OPEN_ROUTER=false`, AWS credentials, `BEDROCK_MODEL_ID`, region

Authentication (Clerk)

NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY, CLERK_SECRET_KEY, CLERK_JWKS_URL or CLERK_ISSUER (as used by the backend) enable full auth.
Omitting Clerk config triggers auth disabled mode when using scripts/run_local.py, as described above.

Notifications (Resend)

RESEND_API_KEY, RESEND_FROM for follow-up emails (test sender supported).

Optional AWS

S3_BUCKET for uploads / PDF flows that use S3.
DEFAULT_AWS_REGION, account and access keys as needed for Bedrock or S3.

Outbound integrations (Slack / webhooks)

These control when and where completed analysis is pushed after a job finishes (not ingestion of external alerts):

INTEGRATION_NOTIFY_SEVERITIES — Comma-separated list of severities that trigger dispatch (default: high,critical). Example: high,critical or critical only.
SENTINEL_PUBLIC_URL or NEXT_PUBLIC_APP_URL — Public dashboard base URL (no trailing slash). When set, generic webhook payloads include a dashboard_url pointing at the job; when unset, dashboard_url is null.

Integrations themselves are stored per user in the database and configured in the UI under Settings (see Slack and generic webhooks).

For narrative setup instructions, see the Local Development section in intel.md.

Running locally

Both services

cd scripts
uv run run_local.py

Backend only

cd backend
uv run uvicorn api.main:app --host 0.0.0.0 --port 8000

Frontend only

cd frontend
npm install
npm run dev

Health check: GET http://localhost:8000/health

API overview

Base URL in local development: http://localhost:8000

Area	Examples
Core	`GET /health`, `GET /api/me`, `GET /api/team/members`
Incidents & jobs	`POST /api/incidents`, `POST /api/incidents/analyze-sync`, `POST /api/incidents/bulk-zip` (multipart field `archive`, optional query `title_prefix`, `source`, `max_files`), `GET /api/jobs`, `GET /api/jobs/{job_id}`, `POST /api/jobs/{job_id}/run`, `GET /api/jobs/{job_id}/workflow`
Streaming	`GET /api/jobs/{job_id}/stream`, `POST /api/stream/investigate`
Exports	`GET /api/jobs/{job_id}/export`, `GET /api/jobs/{job_id}/audit/pdf`
Remediation	`GET/PATCH .../actions`, chat `GET/POST .../actions/{action_id}/chat`, `POST .../actions/{action_id}/evaluate`
Follow-ups & clarify	follow-ups under `/api/jobs/{job_id}/follow-ups`, clarifications under `/api/jobs/{job_id}/clarify`
Integrations	`GET/POST/DELETE /api/integrations` (Slack, generic webhook, Jira, PagerDuty configs); ingestion webhooks under `/api/ingest/webhook*`
Analytics & reports	`GET /api/analytics/mttr`, `POST /api/reports/digest`, PIR routes under `/api/jobs/{job_id}/pir`

Interactive docs: when the API is running, OpenAPI is available at /docs (Swagger UI) unless disabled in your build.

Frontend

Next.js Pages Router app (frontend/pages/):

Route	Purpose
`/`	Analyze: paste incident text or Upload ZIP (bulk) for many jobs from one archive
`/dashboard`	Jobs, stats, analysis detail
`/audit`	Audit-oriented views
`/settings`	Integrations and preferences
`/sign-in`, `/sign-up`	Clerk auth

The UI calls the API at NEXT_PUBLIC_API_URL when set; otherwise it defaults to http://localhost:8000 (see frontend/lib/api.js). With run_local.py, you can put NEXT_PUBLIC_* and Clerk keys in the root .env. If you run npm run dev alone, you can instead use frontend/.env.local (see frontend/README.md).

Bulk ZIP upload

Use this when you have many small log files (for example per-service .txt or .log exports) and want one incident + job per member file without pasting each by hand.

In the UI

On / (Analyze), choose Upload ZIP (bulk). The client sends multipart/form-data with the file in field archive to POST /api/incidents/bulk-zip. The page shows Bulk Upload Results (created jobs, skipped members) and lets you open each job in the analysis panel.

API behavior

Topic	Detail
Endpoint	`POST /api/incidents/bulk-zip`
Multipart	Form field name must be `archive` (see backend/test_bulk_zip_api.py).
Raw body	You may instead POST the raw ZIP bytes with `Content-Type: application/zip` (or `application/octet-stream`) for scripts.
Query	`source` (default `upload`), optional `title_prefix` (prepended to each incident title), `max_files` (default 25, max 100).
Archive size (entries)	Archives with more than 400 file entries are rejected (400) before reading bodies, to bound CPU/memory on pathological zips.
Ingested extensions	`.txt`, `.log`, `.json`, `.ndjson`, `.md`, `.csv`
Per-file size	Members larger than 500 KB are skipped (not ingested).
Finder metadata	Paths under `__MACOSX` and AppleDouble `._` files are not ingested as incidents but are still scanned* for hidden threats; a bad payload there fails the whole archive.
Preflight	If any scanned member fails guardrails (for example prompt-injection-like content in a log line), the API returns 400 with `detail.error === "bulk_zip_validation_failed"` and a `failures` list — no incidents are created (all-or-nothing).

Automated coverage: backend/common/test_bulk_zip_preflight.py, backend/test_bulk_zip_api.py.

Slack and generic webhooks

Sentinel can notify external systems when a job’s analysis completes, for severities configured by INTEGRATION_NOTIFY_SEVERITIES (see Configuration).

Configure in the app

Open /settings while signed in (or in local auth-disabled mode as your dev user).
Under Add Integration, pick Slack or Generic Webhook and paste the webhook URL.
Save. Other types (Jira, PagerDuty) may be present in the same list; this section focuses on Slack and HTTP JSON webhooks.

Slack: Use a real Incoming Webhook URL (https://hooks.slack.com/services/... with three path segments). Do not paste documentation placeholders that use a Unicode ellipsis (…) in the path — the API rejects those because Slack responds with redirects, not a successful post.

Generic webhook payload

The server POSTs JSON to your URL. Top-level fields include:

event: sentinel.analysis.completed
incident_id, job_id
incident_title, incident_source — Title is typically the incident label (for bulk ZIP, often title_prefix + member file name); source is usually upload or manual.
severity, summary, severity_reason, root cause fields, recommended_actions, next_checks, risk_if_unresolved
dashboard_url — Absolute link when SENTINEL_PUBLIC_URL / NEXT_PUBLIC_APP_URL is set; otherwise null.

Manual smoke test (from backend/): WEBHOOK_URL=https://… uv run python -m integrations.manual_dispatch (optional INTEGRATION_TYPE=slack, INCIDENT_TITLE, INCIDENT_SOURCE).

Dispatch runs from the same run_job path used locally and on Lambda. Packaging scripts include common/ and integrations/ but skip test_*.py / *_test.py so pytest modules are not shipped in deployment zips. After changing integration code, redeploy the Lambdas that run the pipeline (see AWS deployment).

Tests

Backend tests use pytest. From the repo root:

uv run --project backend pytest backend/tests

Test directory:

backend/tests/

AWS deployment

Infrastructure is organized as independent Terraform stages under terraform/, aligned with guides/1_permissions.md through guides/8_enterprise.md. Use each stage’s terraform.tfvars.example (where present) as a template.

Suggested order is documented in gameplan.md: permissions → SageMaker → ingestion → intel → database → agents → frontend → enterprise monitoring.

Documentation

Document	Contents
intel.md	End-to-end intelligence: env, agents, API, frontend, Terraform
gameplan.md	MVP outcomes, guide order, guardrails, delivery focus
SENTINEL_HANDOVER.md	Scaffold / handover status
guides/agent_architecture.md	Agent sequence and data flow
AGENTS.md	Tooling and model conventions for this repo

Squad contributions

Name	Role
Eben and Michael	Backend and frontend: APIs, agent pipeline, and dashboard UI
Joshua	Deployment: AWS, Terraform stages, and end-to-end infrastructure delivery
Tunde	Demo presentation
Ayesha	Base codebase setup and repository documentation; bulk ZIP upload (API preflight guardrails, Analyze UI, job table); Slack and generic webhook outbound integrations (payload fields, severity-based dispatch, dashboard links); test engineering including bulk ZIP and dispatcher coverage
Oluwagbamila	End-to-end application: full product flow from incident intake through analysis to dashboard delivery

Tech stack

Category	Stack
Language	Python 3.12+
Backend	FastAPI, Uvicorn, Pydantic v2, httpx, PyJWT, fpdf2, python-dotenv, Mangum (Lambda adapter)
Frontend	Next.js 14, React 18, Pages Router, Clerk, Recharts
LLM	OpenRouter or Amazon Bedrock (via boto3)
Data	SQLite (local dev); Aurora Serverless v2 + RDS Data API (AWS deployment)
Cloud	API Gateway, Lambda, SQS, CloudFront, S3, App Runner (Intel), EventBridge, CloudWatch
IaC	Terraform — staged modules in `terraform/`
Auth	Clerk (JWT); optional `AUTH_DISABLED` for local development
Tooling	uv, npm, pytest

Version

Application packages in this repo are currently versioned at 0.3.0 (see backend/pyproject.toml and frontend/package.json).

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.firebase		.firebase
assets		assets
backend		backend
frontend		frontend
guides		guides
scripts		scripts
terraform		terraform
.env.example		.env.example
.firebaserc		.firebaserc
.firebaserc.example		.firebaserc.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
DEPLOY_GCP.md		DEPLOY_GCP.md
README.md		README.md
firebase.json		firebase.json

Folders and files

Latest commit

History

Repository files navigation

Sentinel

Architecture

Architecture overview

Diagram

Local development

Table of contents

Why Sentinel

What you get

Repository layout

Prerequisites

Quick start

Configuration

LLM provider (pick one)

Authentication (Clerk)

Notifications (Resend)

Optional AWS

Outbound integrations (Slack / webhooks)

Running locally

Both services

Backend only

Frontend only

API overview

Frontend

Bulk ZIP upload

In the UI

API behavior

Slack and generic webhooks

Configure in the app

Generic webhook payload

Tests

AWS deployment

Documentation

Squad contributions

Tech stack

Version

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages