Observe – Error extraction from log datasets

This folder contains a script and config to extract unique error signatures from Observe log datasets across multiple services and regions. Output is printed to the terminal and written to error_report.txt (IST timestamps, counts, error messages, and deep links to the Observe log explorer).

New here? → See Setup (step-by-step) to run locally, or Running with Docker to run in a container.

What’s included

Item	Description
extract_errors.py	Main script: calls Observe API, runs OPAL pipelines, prints/writes results.
env.sample	Sample environment variables. Copy to `.env` and set values (do not commit real keys).
config/services.sample.json	List of services (name, workspace_id, dataset_id, pipeline_file) for multi-service runs.
pipelines/	OPAL pipeline files (one per service or shared). Use `{{REGION}}` for cluster filter; script replaces it at runtime.
error_report.txt	Written on each run: table(s) of unique errors (and links).
app.py	Flask web app: dashboard check, hostname lookup, Fix (single-error analysis), and Send to Slack (formats report via Gemini and POSTs to webhook).
test.sh	Runs Cursor agent to analyze errors and suggest fixes. Used by the Fix button or manually with `--error-file`.
docs/RUNBOOK.md	Known-error runbook: maps common errors (e.g. `DeploymentControllerRMQ`) to root causes and fixes.

Services in `config/services.sample.json`

Launch Management
Launch Management Background Jobs Service
Launch Logs service
Launch Logs bg service
Launch telemetry service
Launch logs-bg-exporter-service
Launch Nginx service
Launch Deployment Agent

Each entry can override workspace_id, dataset_id, and pipeline_file. Pipeline files live under pipelines/ and must output columns: latest_timestamp, total_occurrences, error_msg, context.

Project structure

Observe/
├── app.py                 # Flask web app (dashboard check, Fix, Slack)
├── extract_errors.py      # Error extraction script
├── test.sh                # Fix flow: runs Cursor agent on errors
├── config/
│   └── services.sample.json   # Service definitions for multi-service runs
├── docs/
│   └── RUNBOOK.md         # Known-error runbook
├── output/                # Generated files (gitignored)
│   ├── error_report.txt   # Full report from extract_errors
│   ├── error_to_fix.txt   # Single error for Fix button
│   └── agent_analysis.md  # Cursor agent analysis output
├── pipelines/             # OPAL pipeline files per service
├── static/
│   └── index.html         # Web UI
├── env.sample             # Environment template (copy to .env)
├── requirements.txt
├── Dockerfile              # Web app
├── Dockerfile.cli         # CLI (extract_errors)
└── README.md

How to get your Customer ID and API token

Customer ID (OBSERVE_CUSTOMER_ID)

Open your Observe workspace in the browser, e.g.:
https://143110822295.eu-1.observeinc.com/workspace/41096433/home?tab=Favorites
The Customer ID is the first segment after https:// — i.e. the subdomain before .observeinc.com.
- From https://143110822295.eu-1.observeinc.com/workspace/... → Customer ID is 143110822295.

Set this in .env as OBSERVE_CUSTOMER_ID.

API token (OBSERVE_API_KEY)

Go to the API tokens page in your Observe instance:
https://143110822295.eu-1.observeinc.com/settings/my-api-tokens
(Replace 143110822295 and eu-1 with your own customer ID and cluster if different.)
Create a new API token (or use an existing one). Copy the token value once; it may not be shown again.
Set it in your environment or .env as OBSERVE_API_KEY (see Environment variables).

The script sends: Authorization: Bearer <OBSERVE_CUSTOMER_ID> <OBSERVE_API_KEY>.
Ensure the token’s user has dataset:view (or equivalent) on the datasets you query.

Setup (step-by-step)

Follow this flow to get running from scratch.

1. Prerequisites

Python 3 (3.8+ recommended)
Access to your Observe instance (Customer ID and API token)

2. Get the code

Clone or open the repo and go to the project root (the folder that contains the Observe directory):

cd /path/to/Observe-automation

3. Install dependencies

From the project root (where requirements.txt lives):

pip install -r requirements.txt

Or from inside Observe/:

cd Observe
pip install -r ../requirements.txt

(Optional) Use a virtual environment:

python3 -m venv .venv
source .venv/bin/activate   # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

4. Configure environment

Go into the Observe folder (if not already there):
```
cd Observe
```
Copy the sample env file and edit it with your values:
```
cp env.sample .env
```
In .env, set at least:
- OBSERVE_CUSTOMER_ID – from your Observe URL (see How to get your Customer ID and API token)
- OBSERVE_API_KEY – from Observe → Settings → My API tokens
  Optionally set OBSERVE_CLUSTER, OBSERVE_WORKSPACE_ID, OBSERVE_DATASET_ID, and REGION as needed.

Do not commit .env (it contains secrets).

5. Load env and run

From the Observe folder, load your env and run the script:

# Load environment variables (choose one)
export $(grep -v '^#' .env | xargs)
# Or: set -a && source .env && set +a

# Single service (default workspace/dataset from .env)
python3 extract_errors.py

# All services from config/services.sample.json
python3 extract_errors.py --all-services

# All services × all regions (full report)
python3 extract_errors.py --auto

Results are printed to the terminal and written to error_report.txt in the same folder.

6. (Optional) Run the web UI

A simple frontend lets you set env vars and run the same checks from the browser:

cd Observe
pip install -r requirements.txt   # includes Flask
python3 app.py

Open http://localhost:5000 (or http://localhost:5001 if 5000 is in use). Enter OBSERVE_CUSTOMER_ID and OBSERVE_API_KEY (required), optionally expand and set cluster, workspace, dataset, region, and time range. Choose a run mode (Single service, All services, All regions, or Auto) and click Run dashboard check. The report appears on the page; you can copy or download it.

Send to Slack

After running a dashboard check, you can format the report and send it to Slack:

Set SLACK_WEBHOOK_URL in .env (e.g. a Slack Incoming Webhook URL or Contentstack Automations API URL).
Set GEMINI_API_KEY in .env — the app uses Gemini to format the report for Slack. Get a free key at Google AI Studio.
Run a dashboard check, then click Send me a Slack.
Optionally enter a Channel ID (e.g. C1234567890 or D086ZCDT6B0) in the UI to override the webhook’s default channel.

The formatted message is POSTed to your webhook as JSON (text, blocks, and optionally channel). If no webhook is configured, the formatted message is copied to your clipboard instead.

Fix flow (single-error analysis)

The Fix button runs the Cursor agent to analyze a single error and suggest concrete fixes in your codebase.

Prerequisites

Cursor CLI – Install the Cursor agent CLI and ensure agent is in your PATH.
Workspace setup – The agent can only modify code it can see. Set AGENT_WORKSPACE to a directory that contains both this Observe app and the repos that produce the errors (e.g. contentfly-management-background-jobs-service).

How to use

Run a dashboard check and wait for the report.
Click Fix next to the error you want to analyze.
The app writes output/error_to_fix.txt and starts test.sh automatically.
Output appears in your terminal and in output/agent_analysis.md.

Configuration (in `.env`)

Variable	Purpose	Default
AGENT_MODEL	Cursor agent model. Run `agent models` to list options.	`composer-1.5`
AGENT_WORKSPACE	Root directory the agent can read and modify. Must include the repos with the code that throws the errors.	`../` (parent of Observe)

Example: If your layout is:

/Users/you/
├── Observe-automation/Observe/     ← this app
└── contentfly-management-background-jobs-service/   ← service that produces errors

Set in .env:

AGENT_WORKSPACE=/Users/you

Then the agent can suggest and apply fixes in both repos.

Manual run

You can also run the fix flow manually:

# After clicking Fix, or if you have output/error_to_fix.txt:
./test.sh --error-file output/error_to_fix.txt

# With a different model:
./test.sh --error-file output/error_to_fix.txt --model <model-name>

# Full report (runs extract_errors.py first, then agent):
./test.sh

Quick reference

Step	What to do
Install	`pip install -r requirements.txt` (from project root)
Config	`cd Observe` → `cp env.sample .env` → set OBSERVE_CUSTOMER_ID and OBSERVE_API_KEY
Run	Load `.env`, then `python3 extract_errors.py --all-services` (or `--auto`)
Output	Terminal + `Observe/error_report.txt`
Web UI	`cd Observe && python3 app.py` → open http://localhost:5000 (details)
Send to Slack	Set `SLACK_WEBHOOK_URL` + `GEMINI_API_KEY` in `.env` → run dashboard check → click Send me a Slack (details)
Fix flow	Set `AGENT_WORKSPACE` (and optionally `AGENT_MODEL`) in `.env` → run dashboard check → click Fix next to an error (details)
Deploy on Render	Push to GitHub → connect repo at Render → deploy (details)
Docker	Web app: `docker build -t observe .` then `docker run -p 5000:5000 --env-file .env observe`. CLI: `docker build -f Dockerfile.cli -t observe-cli .` (details)

Deploy on Render

You can host the web UI on Render for free (with limits).

Push your code to a GitHub (or GitLab) repository. Ensure the repo root contains the Observe folder and the root render.yaml.
Create a Web Service on Render:
- Go to dashboard.render.com → New → Web Service.
- Connect your repository.
- If you use the repo’s Blueprint (render.yaml), Render will create the service from it. Otherwise set:
  - Root Directory: Observe
  - Runtime: Python 3
  - Build Command: pip install -r requirements.txt
  - Start Command: gunicorn --bind 0.0.0.0:$PORT app:app
Deploy. Render will build and run the app. Your URL will be like https://observe-dashboard-check.onrender.com.
Credentials: The app does not store Observe credentials on the server. Users enter OBSERVE_CUSTOMER_ID and OBSERVE_API_KEY in the browser (and can save them in localStorage).
Slack (optional): To enable "Send me a Slack", add SLACK_WEBHOOK_URL and GEMINI_API_KEY as environment variables in Render’s dashboard.

Note: On the free tier, requests may time out after ~30–60 seconds. For long “Run dashboard check” runs (e.g. All services × All regions), use a single service or fewer regions, or consider a paid plan for longer timeouts.

Running with Docker

Two Dockerfiles: Dockerfile (web app) and Dockerfile.cli (CLI only).’

Web app

cd Observe
docker build -t observe .
docker run -p 5000:5000 --env-file .env observe

Open http://localhost:5000. Credentials are entered in the browser (not stored on the server).

Note: The Fix flow requires Cursor CLI and runs outside the container, so it does not work when the app runs in Docker. Use the app locally for Fix.

CLI (extract_errors)

cd Observe
docker build -f Dockerfile.cli -t observe-cli .
docker run --rm --env-file .env observe-cli --all-services
docker run --rm --env-file .env observe-cli --auto

Save the report to your host (CLI)

Mount a directory to get output/error_report.txt on your machine:

docker run --rm --env-file .env -v "$(pwd)/output:/app/output" observe-cli --auto

Then open ./output/error_report.txt.

Environment variables

Copy env.sample to .env in this folder (or export in the shell). Load before running, e.g.:

set -a && source .env && set +a && python3 extract_errors.py --all-services
# or
export $(grep -v '^#' .env | xargs) && python3 extract_errors.py --all-services

Variable	Purpose	Default (if any)
OBSERVE_CUSTOMER_ID	Your Observe customer ID (in the URL).	`143110822295`
OBSERVE_API_KEY	API token for authentication.	(none – set this)
OBSERVE_CLUSTER	Regional cluster (e.g. `eu-1`). Base URL: `https://<customer>.<cluster>.observeinc.com/...`	`eu-1`
OBSERVE_WORKSPACE_ID	Default workspace for single-service or fallback in config.	`41096433`
OBSERVE_DATASET_ID	Default dataset when running a single service.	(e.g. `41249174`)
REGION	Value for `{{REGION}}` in OPAL (cluster filter, e.g. `label(^Cluster) = "{{REGION}}"`).	`aws-na`
START_IST	Start of time window in IST (`YYYY-MM-DD HH:MM:SS`). Optional; with END_IST overrides “last 24h”.	—
END_IST	End of time window in IST. Optional.	—
SLACK_WEBHOOK_URL	Webhook URL for "Send me a Slack" (Slack Incoming Webhook, Contentstack Automations API, or any HTTP endpoint). If set, the formatted report is POSTed here.	—
GEMINI_API_KEY	Google Gemini API key for formatting the report as a Slack message. Get a free key at Google AI Studio.	—
GEMINI_MODEL	Gemini model name.	`gemini-1.5-flash`
AGENT_MODEL	Cursor agent model for Fix flow. Run `agent models` to list options.	`composer-1.5`
AGENT_WORKSPACE	Root directory for Fix flow. Must include repos with the code that produces the errors.	`../`
OBSERVE_LOOKUP_DAYS	For hostname lookup: time range = past N days. If unset, uses 15 minutes.	—
OBSERVE_LOOKUP_TIMEOUT_SEC	HTTP timeout (seconds) for Observe API calls in hostname lookup. Increase if queries time out with large `OBSERVE_LOOKUP_DAYS`.	`300`

Valid regions (for REGION or --all-regions):
aws-na, aws-eu, aws-au, azure-na, azure-eu, gcp-na, gcp-eu

Commands and flags

Run from the Observe folder (or pass correct paths to --config / --pipeline-file).

Single service (default workspace/dataset from env)

python3 extract_errors.py

Uses OBSERVE_WORKSPACE_ID, OBSERVE_DATASET_ID, REGION, and last 24 hours.

Single service with explicit dataset and pipeline

python3 extract_errors.py -d <dataset_id> -p pipelines/<pipeline>.opal

Example (Nginx only):

python3 extract_errors.py -d 41250854 -p pipelines/launch_nginx_errors.opal

All services (from `services.sample.json`)

python3 extract_errors.py --all-services

Finds config/services.sample.json and runs every service in it.

Custom services config

python3 extract_errors.py --config path/to/services.json

All regions (one region at a time, with section headers)

Runs the same run (all services or single) for each region and concatenates output:

python3 extract_errors.py --all-services --all-regions

Or with a custom config:

python3 extract_errors.py --config services.json --all-regions

Auto (all services × all regions)

Equivalent to --all-services --all-regions:

python3 extract_errors.py --auto

Time window (IST)

Override default “last 24 hours” by setting both start and end in IST:

python3 extract_errors.py --all-services --start "2026-02-13 00:00:00" --end "2026-02-14 12:00:00"

Or use env: START_IST and END_IST.

Override workspace/dataset from CLI

python3 extract_errors.py -w <workspace_id> -d <dataset_id>

Flag reference

Flag	Short	Description
--workspace	-w	Override workspace ID.
--dataset	-d	Override dataset ID (single-service).
--start	—	Start time in IST (`YYYY-MM-DD HH:MM:SS`). Env: `START_IST`.
--end	—	End time in IST. Env: `END_IST`.
--pipeline-file	-p	Path to OPAL pipeline file (single-service).
--config	-c	Path to JSON file listing services (name, workspace_id, dataset_id, pipeline_file).
--all-services	—	Use `config/services.sample.json` as config.
--all-regions	—	Run for every region (`aws-na`, `aws-eu`, …) and print/write combined output.
--auto	—	Same as `--all-services --all-regions` (all services × all regions).

Output

Terminal: Progress lines like 🚀 Extracting unique error signatures from <service> [region: <region>]... and one table per service (and per region when using --all-regions).
output/error_report.txt: Same table(s) in one file (IST timestamp, count, error & context, link to Observe log explorer).

Pipelines and `{{REGION}}`

Pipeline files under pipelines/ define the OPAL (filters, make_col, statsby, etc.).
The script replaces {{REGION}} in the pipeline with the current region (env REGION or the loop value when using --all-regions).
Custom pipelines must output: latest_timestamp, total_occurrences, error_msg, context so the script can build the table and links.
To copy OPAL from the Observe UI: Worksheet → query editor → OPAL tab; or Log Explorer → query builder → OPAL. Use only the pipeline part (no interface "..." line).

Quick start

For full steps see Setup (step-by-step). Short version:

From project root: pip install -r requirements.txt
cd Observe → copy env.sample to .env and set OBSERVE_CUSTOMER_ID and OBSERVE_API_KEY
Load env: export $(grep -v '^#' .env | xargs) (or set -a && source .env && set +a)
Run: python3 extract_errors.py --all-services or python3 extract_errors.py --auto
Check output/error_report.txt for the full report

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
config		config
docs		docs
pipelines		pipelines
static		static
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.cli		Dockerfile.cli
README.md		README.md
app.py		app.py
env.sample		env.sample
extract_errors.py		extract_errors.py
render.yaml		render.yaml
requirements.txt		requirements.txt
runtime.txt		runtime.txt
test.sh		test.sh

Folders and files

Latest commit

History

Repository files navigation

Observe – Error extraction from log datasets

What’s included

Services in config/services.sample.json

Project structure

How to get your Customer ID and API token

Customer ID (OBSERVE_CUSTOMER_ID)

API token (OBSERVE_API_KEY)

Setup (step-by-step)

1. Prerequisites

2. Get the code

3. Install dependencies

4. Configure environment

5. Load env and run

6. (Optional) Run the web UI

Send to Slack

Fix flow (single-error analysis)

Prerequisites

How to use

Configuration (in .env)

Manual run

Quick reference

Deploy on Render

Running with Docker

Web app

CLI (extract_errors)

Save the report to your host (CLI)

Environment variables

Commands and flags

Single service (default workspace/dataset from env)

Single service with explicit dataset and pipeline

All services (from services.sample.json)

Custom services config

All regions (one region at a time, with section headers)

Auto (all services × all regions)

Time window (IST)

Override workspace/dataset from CLI

Flag reference

Output

Pipelines and {{REGION}}

Quick start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Services in `config/services.sample.json`

Configuration (in `.env`)

All services (from `services.sample.json`)

Pipelines and `{{REGION}}`

Packages