Skip to content

AryanBansal-launch/Observe-automation

Repository files navigation

Observe – Error extraction from log datasets

This folder contains a script and config to extract unique error signatures from Observe log datasets across multiple services and regions. Output is printed to the terminal and written to error_report.txt (IST timestamps, counts, error messages, and deep links to the Observe log explorer).

New here? → See Setup (step-by-step) to run locally, or Running with Docker to run in a container.


What’s included

Item Description
extract_errors.py Main script: calls Observe API, runs OPAL pipelines, prints/writes results.
env.sample Sample environment variables. Copy to .env and set values (do not commit real keys).
config/services.sample.json List of services (name, workspace_id, dataset_id, pipeline_file) for multi-service runs.
pipelines/ OPAL pipeline files (one per service or shared). Use {{REGION}} for cluster filter; script replaces it at runtime.
error_report.txt Written on each run: table(s) of unique errors (and links).
app.py Flask web app: dashboard check, hostname lookup, Fix (single-error analysis), and Send to Slack (formats report via Gemini and POSTs to webhook).
test.sh Runs Cursor agent to analyze errors and suggest fixes. Used by the Fix button or manually with --error-file.
docs/RUNBOOK.md Known-error runbook: maps common errors (e.g. DeploymentControllerRMQ) to root causes and fixes.

Services in config/services.sample.json

  • Launch Management
  • Launch Management Background Jobs Service
  • Launch Logs service
  • Launch Logs bg service
  • Launch telemetry service
  • Launch logs-bg-exporter-service
  • Launch Nginx service
  • Launch Deployment Agent

Each entry can override workspace_id, dataset_id, and pipeline_file. Pipeline files live under pipelines/ and must output columns: latest_timestamp, total_occurrences, error_msg, context.

Project structure

Observe/
├── app.py                 # Flask web app (dashboard check, Fix, Slack)
├── extract_errors.py      # Error extraction script
├── test.sh                # Fix flow: runs Cursor agent on errors
├── config/
│   └── services.sample.json   # Service definitions for multi-service runs
├── docs/
│   └── RUNBOOK.md         # Known-error runbook
├── output/                # Generated files (gitignored)
│   ├── error_report.txt   # Full report from extract_errors
│   ├── error_to_fix.txt   # Single error for Fix button
│   └── agent_analysis.md  # Cursor agent analysis output
├── pipelines/             # OPAL pipeline files per service
├── static/
│   └── index.html         # Web UI
├── env.sample             # Environment template (copy to .env)
├── requirements.txt
├── Dockerfile              # Web app
├── Dockerfile.cli         # CLI (extract_errors)
└── README.md

How to get your Customer ID and API token

Customer ID (OBSERVE_CUSTOMER_ID)

  1. Open your Observe workspace in the browser, e.g.:
    https://143110822295.eu-1.observeinc.com/workspace/41096433/home?tab=Favorites
  2. The Customer ID is the first segment after https:// — i.e. the subdomain before .observeinc.com.
    • From https://143110822295.eu-1.observeinc.com/workspace/... → Customer ID is 143110822295.

Set this in .env as OBSERVE_CUSTOMER_ID.

API token (OBSERVE_API_KEY)

  1. Go to the API tokens page in your Observe instance:
    https://143110822295.eu-1.observeinc.com/settings/my-api-tokens
    (Replace 143110822295 and eu-1 with your own customer ID and cluster if different.)
  2. Create a new API token (or use an existing one). Copy the token value once; it may not be shown again.
  3. Set it in your environment or .env as OBSERVE_API_KEY (see Environment variables).

The script sends: Authorization: Bearer <OBSERVE_CUSTOMER_ID> <OBSERVE_API_KEY>.
Ensure the token’s user has dataset:view (or equivalent) on the datasets you query.


Setup (step-by-step)

Follow this flow to get running from scratch.

1. Prerequisites

  • Python 3 (3.8+ recommended)
  • Access to your Observe instance (Customer ID and API token)

2. Get the code

Clone or open the repo and go to the project root (the folder that contains the Observe directory):

cd /path/to/Observe-automation

3. Install dependencies

From the project root (where requirements.txt lives):

pip install -r requirements.txt

Or from inside Observe/:

cd Observe
pip install -r ../requirements.txt

(Optional) Use a virtual environment:

python3 -m venv .venv
source .venv/bin/activate   # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

4. Configure environment

  1. Go into the Observe folder (if not already there):
    cd Observe
  2. Copy the sample env file and edit it with your values:
    cp env.sample .env
  3. In .env, set at least:
    • OBSERVE_CUSTOMER_ID – from your Observe URL (see How to get your Customer ID and API token)
    • OBSERVE_API_KEY – from Observe → Settings → My API tokens
      Optionally set OBSERVE_CLUSTER, OBSERVE_WORKSPACE_ID, OBSERVE_DATASET_ID, and REGION as needed.

Do not commit .env (it contains secrets).

5. Load env and run

From the Observe folder, load your env and run the script:

# Load environment variables (choose one)
export $(grep -v '^#' .env | xargs)
# Or: set -a && source .env && set +a

# Single service (default workspace/dataset from .env)
python3 extract_errors.py

# All services from config/services.sample.json
python3 extract_errors.py --all-services

# All services × all regions (full report)
python3 extract_errors.py --auto

Results are printed to the terminal and written to error_report.txt in the same folder.

6. (Optional) Run the web UI

A simple frontend lets you set env vars and run the same checks from the browser:

cd Observe
pip install -r requirements.txt   # includes Flask
python3 app.py

Open http://localhost:5000 (or http://localhost:5001 if 5000 is in use). Enter OBSERVE_CUSTOMER_ID and OBSERVE_API_KEY (required), optionally expand and set cluster, workspace, dataset, region, and time range. Choose a run mode (Single service, All services, All regions, or Auto) and click Run dashboard check. The report appears on the page; you can copy or download it.

Send to Slack

After running a dashboard check, you can format the report and send it to Slack:

  1. Set SLACK_WEBHOOK_URL in .env (e.g. a Slack Incoming Webhook URL or Contentstack Automations API URL).
  2. Set GEMINI_API_KEY in .env — the app uses Gemini to format the report for Slack. Get a free key at Google AI Studio.
  3. Run a dashboard check, then click Send me a Slack.
  4. Optionally enter a Channel ID (e.g. C1234567890 or D086ZCDT6B0) in the UI to override the webhook’s default channel.

The formatted message is POSTed to your webhook as JSON (text, blocks, and optionally channel). If no webhook is configured, the formatted message is copied to your clipboard instead.

Fix flow (single-error analysis)

The Fix button runs the Cursor agent to analyze a single error and suggest concrete fixes in your codebase.

Prerequisites

  • Cursor CLI – Install the Cursor agent CLI and ensure agent is in your PATH.
  • Workspace setup – The agent can only modify code it can see. Set AGENT_WORKSPACE to a directory that contains both this Observe app and the repos that produce the errors (e.g. contentfly-management-background-jobs-service).

How to use

  1. Run a dashboard check and wait for the report.
  2. Click Fix next to the error you want to analyze.
  3. The app writes output/error_to_fix.txt and starts test.sh automatically.
  4. Output appears in your terminal and in output/agent_analysis.md.

Configuration (in .env)

Variable Purpose Default
AGENT_MODEL Cursor agent model. Run agent models to list options. composer-1.5
AGENT_WORKSPACE Root directory the agent can read and modify. Must include the repos with the code that throws the errors. ../ (parent of Observe)

Example: If your layout is:

/Users/you/
├── Observe-automation/Observe/     ← this app
└── contentfly-management-background-jobs-service/   ← service that produces errors

Set in .env:

AGENT_WORKSPACE=/Users/you

Then the agent can suggest and apply fixes in both repos.

Manual run

You can also run the fix flow manually:

# After clicking Fix, or if you have output/error_to_fix.txt:
./test.sh --error-file output/error_to_fix.txt

# With a different model:
./test.sh --error-file output/error_to_fix.txt --model <model-name>

# Full report (runs extract_errors.py first, then agent):
./test.sh

Quick reference

Step What to do
Install pip install -r requirements.txt (from project root)
Config cd Observecp env.sample .env → set OBSERVE_CUSTOMER_ID and OBSERVE_API_KEY
Run Load .env, then python3 extract_errors.py --all-services (or --auto)
Output Terminal + Observe/error_report.txt
Web UI cd Observe && python3 app.py → open http://localhost:5000 (details)
Send to Slack Set SLACK_WEBHOOK_URL + GEMINI_API_KEY in .env → run dashboard check → click Send me a Slack (details)
Fix flow Set AGENT_WORKSPACE (and optionally AGENT_MODEL) in .env → run dashboard check → click Fix next to an error (details)
Deploy on Render Push to GitHub → connect repo at Render → deploy (details)
Docker Web app: docker build -t observe . then docker run -p 5000:5000 --env-file .env observe. CLI: docker build -f Dockerfile.cli -t observe-cli . (details)

Deploy on Render

You can host the web UI on Render for free (with limits).

  1. Push your code to a GitHub (or GitLab) repository. Ensure the repo root contains the Observe folder and the root render.yaml.

  2. Create a Web Service on Render:

    • Go to dashboard.render.comNewWeb Service.
    • Connect your repository.
    • If you use the repo’s Blueprint (render.yaml), Render will create the service from it. Otherwise set:
      • Root Directory: Observe
      • Runtime: Python 3
      • Build Command: pip install -r requirements.txt
      • Start Command: gunicorn --bind 0.0.0.0:$PORT app:app
  3. Deploy. Render will build and run the app. Your URL will be like https://observe-dashboard-check.onrender.com.

  4. Credentials: The app does not store Observe credentials on the server. Users enter OBSERVE_CUSTOMER_ID and OBSERVE_API_KEY in the browser (and can save them in localStorage).

  5. Slack (optional): To enable "Send me a Slack", add SLACK_WEBHOOK_URL and GEMINI_API_KEY as environment variables in Render’s dashboard.

Note: On the free tier, requests may time out after ~30–60 seconds. For long “Run dashboard check” runs (e.g. All services × All regions), use a single service or fewer regions, or consider a paid plan for longer timeouts.


Running with Docker

Two Dockerfiles: Dockerfile (web app) and Dockerfile.cli (CLI only).’

Web app

cd Observe
docker build -t observe .
docker run -p 5000:5000 --env-file .env observe

Open http://localhost:5000. Credentials are entered in the browser (not stored on the server).

Note: The Fix flow requires Cursor CLI and runs outside the container, so it does not work when the app runs in Docker. Use the app locally for Fix.

CLI (extract_errors)

cd Observe
docker build -f Dockerfile.cli -t observe-cli .
docker run --rm --env-file .env observe-cli --all-services
docker run --rm --env-file .env observe-cli --auto

Save the report to your host (CLI)

Mount a directory to get output/error_report.txt on your machine:

docker run --rm --env-file .env -v "$(pwd)/output:/app/output" observe-cli --auto

Then open ./output/error_report.txt.


Environment variables

Copy env.sample to .env in this folder (or export in the shell). Load before running, e.g.:

set -a && source .env && set +a && python3 extract_errors.py --all-services
# or
export $(grep -v '^#' .env | xargs) && python3 extract_errors.py --all-services
Variable Purpose Default (if any)
OBSERVE_CUSTOMER_ID Your Observe customer ID (in the URL). 143110822295
OBSERVE_API_KEY API token for authentication. (none – set this)
OBSERVE_CLUSTER Regional cluster (e.g. eu-1). Base URL: https://<customer>.<cluster>.observeinc.com/... eu-1
OBSERVE_WORKSPACE_ID Default workspace for single-service or fallback in config. 41096433
OBSERVE_DATASET_ID Default dataset when running a single service. (e.g. 41249174)
REGION Value for {{REGION}} in OPAL (cluster filter, e.g. label(^Cluster) = "{{REGION}}"). aws-na
START_IST Start of time window in IST (YYYY-MM-DD HH:MM:SS). Optional; with END_IST overrides “last 24h”.
END_IST End of time window in IST. Optional.
SLACK_WEBHOOK_URL Webhook URL for "Send me a Slack" (Slack Incoming Webhook, Contentstack Automations API, or any HTTP endpoint). If set, the formatted report is POSTed here.
GEMINI_API_KEY Google Gemini API key for formatting the report as a Slack message. Get a free key at Google AI Studio.
GEMINI_MODEL Gemini model name. gemini-1.5-flash
AGENT_MODEL Cursor agent model for Fix flow. Run agent models to list options. composer-1.5
AGENT_WORKSPACE Root directory for Fix flow. Must include repos with the code that produces the errors. ../
OBSERVE_LOOKUP_DAYS For hostname lookup: time range = past N days. If unset, uses 15 minutes.
OBSERVE_LOOKUP_TIMEOUT_SEC HTTP timeout (seconds) for Observe API calls in hostname lookup. Increase if queries time out with large OBSERVE_LOOKUP_DAYS. 300

Valid regions (for REGION or --all-regions):
aws-na, aws-eu, aws-au, azure-na, azure-eu, gcp-na, gcp-eu


Commands and flags

Run from the Observe folder (or pass correct paths to --config / --pipeline-file).

Single service (default workspace/dataset from env)

python3 extract_errors.py

Uses OBSERVE_WORKSPACE_ID, OBSERVE_DATASET_ID, REGION, and last 24 hours.

Single service with explicit dataset and pipeline

python3 extract_errors.py -d <dataset_id> -p pipelines/<pipeline>.opal

Example (Nginx only):

python3 extract_errors.py -d 41250854 -p pipelines/launch_nginx_errors.opal

All services (from services.sample.json)

python3 extract_errors.py --all-services

Finds config/services.sample.json and runs every service in it.

Custom services config

python3 extract_errors.py --config path/to/services.json

All regions (one region at a time, with section headers)

Runs the same run (all services or single) for each region and concatenates output:

python3 extract_errors.py --all-services --all-regions

Or with a custom config:

python3 extract_errors.py --config services.json --all-regions

Auto (all services × all regions)

Equivalent to --all-services --all-regions:

python3 extract_errors.py --auto

Time window (IST)

Override default “last 24 hours” by setting both start and end in IST:

python3 extract_errors.py --all-services --start "2026-02-13 00:00:00" --end "2026-02-14 12:00:00"

Or use env: START_IST and END_IST.

Override workspace/dataset from CLI

python3 extract_errors.py -w <workspace_id> -d <dataset_id>

Flag reference

Flag Short Description
--workspace -w Override workspace ID.
--dataset -d Override dataset ID (single-service).
--start Start time in IST (YYYY-MM-DD HH:MM:SS). Env: START_IST.
--end End time in IST. Env: END_IST.
--pipeline-file -p Path to OPAL pipeline file (single-service).
--config -c Path to JSON file listing services (name, workspace_id, dataset_id, pipeline_file).
--all-services Use config/services.sample.json as config.
--all-regions Run for every region (aws-na, aws-eu, …) and print/write combined output.
--auto Same as --all-services --all-regions (all services × all regions).

Output

  • Terminal: Progress lines like 🚀 Extracting unique error signatures from <service> [region: <region>]... and one table per service (and per region when using --all-regions).
  • output/error_report.txt: Same table(s) in one file (IST timestamp, count, error & context, link to Observe log explorer).

Pipelines and {{REGION}}

  • Pipeline files under pipelines/ define the OPAL (filters, make_col, statsby, etc.).
  • The script replaces {{REGION}} in the pipeline with the current region (env REGION or the loop value when using --all-regions).
  • Custom pipelines must output: latest_timestamp, total_occurrences, error_msg, context so the script can build the table and links.
  • To copy OPAL from the Observe UI: Worksheet → query editor → OPAL tab; or Log Explorer → query builder → OPAL. Use only the pipeline part (no interface "..." line).

Quick start

For full steps see Setup (step-by-step). Short version:

  1. From project root: pip install -r requirements.txt
  2. cd Observe → copy env.sample to .env and set OBSERVE_CUSTOMER_ID and OBSERVE_API_KEY
  3. Load env: export $(grep -v '^#' .env | xargs) (or set -a && source .env && set +a)
  4. Run: python3 extract_errors.py --all-services or python3 extract_errors.py --auto
  5. Check output/error_report.txt for the full report

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors