-
Notifications
You must be signed in to change notification settings - Fork 14
Rules Guide
Adan edited this page Oct 6, 2025
·
1 revision
LogForge's Alert Engine automation service applies rule definitions to container signals and executes guarded remediation. Use the playbooks below as reference implementations. They map directly onto the Rule Builder panels (Trigger Condition, Rule Scope, Actions Chain, and the real-time Rule Preview) and call out when you should visit the global Advanced Settings modal for guardrails.
-
Trigger Condition: choose the signal (
Keyword in Logs,Container Event, orMetric Thresholdfor Pro). -
Timeline windows: use the
CountandMinutesfields to require N events within M minutes before the trigger fires; leave them blank for immediate alerts. - Rule Scope: pick whether the rule inspects every container or specific containers/groups that you already maintain inside LogForge Core. The Alert Engine only lists groups that still contain monitored containers.
-
Actions Chain: order remediation steps (
Restart,Stop,Kill,Start,Run script,Send notification) and add per-step delays. Script actions are automatically disabled when scope is global. -
Notifier delivery: every
Send notificationstep posts the alert payload to the Notifier service, which fans it out to the destinations configured in LogForge (email, chat, PagerDuty, webhooks, etc.). -
Advanced Settings: open from the
Alert Rulestoolbar to adjust action cooldowns, backoff, per-hour limits, verification delay, and keyword defaults (case sensitivity, ANY/ALL matching, ignore patterns) for the entire engine.
-
Template: Open the
Templatestab, activateRestart Loop Detection, and set the rule name/scope if prompted. No other fields need to change for default behaviour. -
Trigger Condition: Container event
startwithCount >= 3inside a 5 minute window. -
Rule Scope: In the activation modal, choose
All containersor target one of the LogForge Core groups (for examplePayments Services); only groups with monitored containers appear. You can revisit scope later from the rule detail. -
Actions Chain:
-
Stop container(fires immediately on the offending container). -
Send notification(Notifier pushes the alert to your configured destinations) with a 30 second delay so the stop action completes first.
-
- Advanced Settings: The template inherits global guardrails (default stop cooldown 3 minutes, per-rule max actions 3/hour). Override those system-wide only if you need more frequent remediation.
- Operator notes: For most teams the template defaults suffice-activate it, set the scope, and watch for duplicate-rule warnings before cloning.
-
Template: Activate
Crash Loop Detectionfrom theTemplatestab; adjust only the scope. This template kills instead of stops when a container keeps failing to start. -
Trigger Condition: Container event
stopwithCount >= 5within 10 minutes. -
Rule Scope: Point the rule at the LogForge Core group that owns the fragile workload (for example
Core APIs); the picker only shows groups with monitored containers. -
Actions Chain:
-
Kill containerimmediately to break the crash storm. -
Send notification(Notifier delivers the kill context to the configured destinations) after a 60 second delay.
-
- Advanced Settings: Defaults enforce a 5 minute kill cooldown and a 30 second verification delay. Increase the cooldown globally if the same app repeatedly relaunches under investigation.
- Operator notes: To tweak timelines or swap the kill action for a stop, convert the rule to a custom rule (Pro) or build a fresh custom rule in the builder.
-
Trigger Condition: Select
Keyword in Logs, enter keywords such asERRORandpanic, and setCount >= 5in2minutes. Use the Keyword Settings tab inAdvanced Settingsto flip to case-insensitive matching or add ignore patterns for noisy stack traces. -
Rule Scope: Target the specific containers known for noisy logs (multi-select from the monitored container inventory) or one of your LogForge Core groups such as
Frontend Services. -
Actions Chain:
-
Send notification(Notifier pushes the incident payload to the destinations you configured). - Optional
Restart containerwith a 60 second delay if rebooting clears the condition.
-
- Advanced Settings: Keep the default restart cooldown (5 minutes) and verification delay (30 seconds); widen them globally if restart storms occur.
- Operator notes: Start as notify-only, watch the Alerts dashboard, then layer the restart step with a delay.
-
Trigger Condition:
Keyword in Logs, enable case-insensitive matching, addunauthorized,failed login,AWS_SECRET_ACCESS_KEY, and requireCount >= 2within60seconds. -
Rule Scope: Select the relevant ingress/authentication group that you already maintain in LogForge Core (for example
Edge Gateways); the Alert Engine surfaces those groups so you can include them while excluding internal tooling. -
Actions Chain:
-
Send notification(Notifier delivers to the security destinations configured in LogForge Core / Notifier). -
Send notificationviaWebhook(Notifier calls the configured webhook endpoint with the formatted alert message). - Optional
Stop containeras a follow-up action after human review.
-
- Advanced Settings: Use the Keywords tab to add ignore patterns for known scanners. Tighten the stop cooldown if security remediations should run only once per hour.
-
Operator notes: Document the response procedure inside the rule description, note which LogForge Core groups the rule covers, and enable
Require approval before restartin the Notifier workflow if you automate container stops.
-
Trigger Condition:
Container eventset tostartwith no additional thresholds. -
Rule Scope: Select the release group that needs priming (e.g.,
Frontend Services) from the Core-managed list. Script actions are only available when the scope is container or group. -
Actions Chain:
-
Delay45 seconds using the step-level delay field before remediation starts. -
Run scriptto execute the first executable.shin/logforge-scripts/inside the triggering container. -
Send notification(Notifier shares the captured{{ action.output }}) so deployers see warmup results.
-
-
Advanced Settings: Ensure
run_scriptcooldown (default 10 minutes) aligns with your rollout cadence. The gatekeeper also enforces max actions per container/hour. -
Operator notes: Provision
/logforge-scripts/with numbered scripts (for example01_warmup.sh) and check script validity via theValidate scriptsAPI before enabling the rule.
- Prefer templates when they exist; they already include safe defaults and duplicate detection. Adjust scope first, then customise if needed.
- Keep remediation narrow by scoping rules to the container groups you curate in LogForge Core. The builder prevents wildcard targeting but it is still easy to over-select.
- Use the global
Advanced Settingsmodal to align cooldowns, exponential backoff, per-rule/per-container limits, verification delays, and keyword defaults with your operational runbooks. - Layer remediation gradually: run notification-only rules, review the Alerts dashboard, then append restart/stop/script actions with deliberate delays.
- When you add script actions, confirm
/bin/shexists, scripts are executable, and the gatekeeper's verification delay is long enough for the script to finish. - Keep the Notifier service destinations current so every
Send notificationstep lands in the right inboxes, chat channels, or webhooks.