Skip to content

Conversation

@w7-mgfcode
Copy link
Owner

@w7-mgfcode w7-mgfcode commented Dec 4, 2025

πŸ‡¬πŸ‡§ English Version

βœ… Completed Tasks

  • Docker Compose stack configuration (docker-compose.yml)
    • Prometheus (port 9090)
    • Grafana (port 3000)
    • Alertmanager (port 9093)
    • Node Exporter (port 9100)
    • Blackbox Exporter (port 9115)
    • cAdvisor (port 8080)
    • Proper networking and volumes
  • Prometheus configuration (prometheus.yml)
    • Scrape configs for all exporters
    • Alert rules reference
    • Blackbox probe configurations (HTTP, ICMP, TCP)
  • Alert rules (alerts/alerts.yml)
    • Host alerts (CPU, memory, disk)
    • Container alerts (Docker)
    • Service alerts (endpoints)
    • Network alerts (connectivity)
    • Prometheus self-monitoring alerts
  • Alertmanager configuration (alertmanager.yml)
    • Route configuration with severity-based routing
    • Receiver templates (email, webhook)
    • Inhibit rules
  • Blackbox Exporter configuration (blackbox.yml)
    • HTTP modules (2xx, SSL, POST, Basic Auth)
    • TCP modules (connect, TLS, banner checks)
    • ICMP modules (IPv4, IPv6)
    • DNS modules (A, SOA records)
  • Grafana datasources provisioning

⏳ In Progress

  • Grafana dashboards provisioning configuration
  • Pre-built dashboard JSON files:
    • network-overview.json
    • server-health.json
    • docker-overview.json

❌ Not Started Yet

  • README.md for the monitoring stack
  • Environment variables documentation
  • Docker Compose validation tests

πŸ› Issues Encountered

  • None so far

πŸ“ Key Files Changed

2-infra-monitoring/
β”œβ”€β”€ docker-compose.yml              # Main stack definition
β”œβ”€β”€ prometheus/
β”‚   β”œβ”€β”€ prometheus.yml              # Prometheus configuration
β”‚   └── alerts/
β”‚       └── alerts.yml              # Alert rules (40+ rules)
β”œβ”€β”€ alertmanager/
β”‚   └── alertmanager.yml            # Alert routing & receivers
β”œβ”€β”€ blackbox/
β”‚   └── blackbox.yml                # Probe modules
└── grafana/
    └── provisioning/
        └── datasources/
            └── datasources.yml     # Auto-provisioned datasources

πŸ§ͺ Testing Status

  • Docker Compose syntax validation
  • Prometheus config validation (promtool check config)
  • Alertmanager config validation (amtool check-config)
  • Stack deployment test

πŸ‡­πŸ‡Ί Magyar VerziΓ³

βœ… ElkΓ©szΓΌlt Feladatok

  • Docker Compose stack konfigurΓ‘ciΓ³ (docker-compose.yml)
    • Prometheus (9090-es port)
    • Grafana (3000-es port)
    • Alertmanager (9093-as port)
    • Node Exporter (9100-as port)
    • Blackbox Exporter (9115-ΓΆs port)
    • cAdvisor (8080-as port)
    • MegfelelΕ‘ hΓ‘lΓ³zat Γ©s volume-ok
  • Prometheus konfigurΓ‘ciΓ³ (prometheus.yml)
    • Scrape config minden exporter-hez
    • Alert szabΓ‘lyok hivatkozΓ‘s
    • Blackbox probe konfigurΓ‘ciΓ³k (HTTP, ICMP, TCP)
  • RiasztΓ‘si szabΓ‘lyok (alerts/alerts.yml)
    • Host riasztΓ‘sok (CPU, memΓ³ria, lemez)
    • Container riasztΓ‘sok (Docker)
    • SzolgΓ‘ltatΓ‘s riasztΓ‘sok (endpoint-ok)
    • HΓ‘lΓ³zati riasztΓ‘sok (elΓ©rhetΕ‘sΓ©g)
    • Prometheus ΓΆnellenΕ‘rzΓ©s riasztΓ‘sok
  • Alertmanager konfigurΓ‘ciΓ³ (alertmanager.yml)
    • Útvonal konfigurΓ‘ciΓ³ sΓΊlyossΓ‘g alapΓΊ irΓ‘nyΓ­tΓ‘ssal
    • Receiver sablonok (email, webhook)
    • Inhibit szabΓ‘lyok
  • Blackbox Exporter konfigurΓ‘ciΓ³ (blackbox.yml)
    • HTTP modulok (2xx, SSL, POST, Basic Auth)
    • TCP modulok (connect, TLS, banner ellenΕ‘rzΓ©s)
    • ICMP modulok (IPv4, IPv6)
    • DNS modulok (A, SOA rekordok)
  • Grafana adatforrΓ‘sok auto-provisioning

⏳ Folyamatban

  • Grafana dashboardok provisioning konfigurΓ‘ciΓ³
  • ElΕ‘re elkΓ©szΓ­tett dashboard JSON fΓ‘jlok:
    • network-overview.json
    • server-health.json
    • docker-overview.json

❌ MΓ©g Nem KezdΕ‘dΓΆtt El

  • README.md a monitoring stack-hez
  • KΓΆrnyezeti vΓ‘ltozΓ³k dokumentΓ‘ciΓ³
  • Docker Compose validΓ‘ciΓ³s tesztek

πŸ› FelmerΓΌlt ProblΓ©mΓ‘k

  • Eddig nem volt

πŸ“ FΕ‘bb MΓ³dosΓ­tott FΓ‘jlok

2-infra-monitoring/
β”œβ”€β”€ docker-compose.yml              # FΕ‘ stack definΓ­ciΓ³
β”œβ”€β”€ prometheus/
β”‚   β”œβ”€β”€ prometheus.yml              # Prometheus konfigurΓ‘ciΓ³
β”‚   └── alerts/
β”‚       └── alerts.yml              # RiasztΓ‘si szabΓ‘lyok (40+ szabΓ‘ly)
β”œβ”€β”€ alertmanager/
β”‚   └── alertmanager.yml            # RiasztΓ‘s irΓ‘nyΓ­tΓ‘s & fogadΓ³k
β”œβ”€β”€ blackbox/
β”‚   └── blackbox.yml                # Probe modulok
└── grafana/
    └── provisioning/
        └── datasources/
            └── datasources.yml     # Auto-provisionΓ‘lt adatforrΓ‘sok

πŸ§ͺ TesztelΓ©s Állapota

  • Docker Compose szintaxis validΓ‘ciΓ³
  • Prometheus config validΓ‘ciΓ³ (promtool check config)
  • Alertmanager config validΓ‘ciΓ³ (amtool check-config)
  • Stack deployment teszt

πŸ€– Generated with Claude Code

Summary by Sourcery

Add a complete Docker Compose–based infrastructure monitoring stack with Prometheus, Alertmanager, Grafana, exporters, and provisioning.

New Features:

  • Introduce a Docker Compose stack that deploys Prometheus, Grafana, Alertmanager, Node Exporter, Blackbox Exporter, and cAdvisor on a dedicated monitoring network.
  • Add Prometheus configuration for scraping core monitoring components and running alerting rules against exporters and blackbox probes.
  • Define Alertmanager routing, receivers, and inhibition rules for severity-based alert handling with email and webhook integrations.
  • Provide Blackbox Exporter modules for HTTP, TCP, ICMP, DNS, and gRPC endpoint probing.
  • Provision Grafana datasources for Prometheus and Alertmanager and configure filesystem-based dashboard provisioning with bundled dashboards for server, Docker, and network overviews.

Documentation:

  • Add a bilingual README describing the monitoring stack architecture, usage, configuration, alert catalogue, maintenance, and extension options.

Summary by CodeRabbit

Release Notes

  • New Features
    • Introduced complete Infra Monitoring Stack with Prometheus, Grafana, and Alertmanager
    • Added pre-built dashboards for container, network, and server health monitoring
    • Configured automatic alert rules for infrastructure components
    • Enabled multi-protocol endpoint monitoring (HTTP, TCP, ICMP, DNS, gRPC)
    • Included Docker Compose setup for streamlined deployment

✏️ Tip: You can customize this high-level summary in your review settings.

- Docker Compose with Prometheus, Grafana, Alertmanager, Node Exporter, Blackbox Exporter, cAdvisor
- Prometheus configuration with scrape configs
- Alert rules for host, container, service, and network monitoring
- Alertmanager configuration with routing and receivers
- Blackbox Exporter modules for HTTP, TCP, ICMP, DNS probes
- Grafana datasources provisioning

πŸ€– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Dec 4, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Introduces a complete Docker Compose-based infrastructure monitoring stack comprising Prometheus for metrics collection, Grafana with three monitoring dashboards, Alertmanager for alert routing, and exporters (Node, Blackbox, cadvisor) for multi-protocol endpoint and system monitoring. Includes configuration files for all components and comprehensive README documentation.

Changes

Cohort / File(s) Summary
Documentation & Setup
2-infra-monitoring/README.md
Comprehensive bilingual documentation covering architecture, component overview, quick start instructions, configuration guidance for all stack components, troubleshooting, and extension options.
Core Infrastructure
2-infra-monitoring/docker-compose.yml
Docker Compose configuration defining seven services (Prometheus, Grafana, Alertmanager, Node Exporter, Blackbox Exporter, cadvisor) with networking, volume provisioning, healthchecks, and environment-based credentials.
Prometheus Configuration
2-infra-monitoring/prometheus/prometheus.yml, 2-infra-monitoring/prometheus/alerts/alerts.yml
Prometheus configuration with global settings, multiple scrape jobs (node-exporter, cadvisor, alertmanager, blackbox probes), and alert rules organized by Host, Container, Service, Network, and Prometheus self-monitoring categories.
Alertmanager & Exporters
2-infra-monitoring/alertmanager/alertmanager.yml, 2-infra-monitoring/blackbox/blackbox.yml
Alertmanager routing, grouping, and notification receivers (email, webhook) with inhibit rules; Blackbox Exporter probe modules for HTTP, TCP, ICMP, DNS, and gRPC with configurable options.
Grafana Provisioning
2-infra-monitoring/grafana/provisioning/datasources/datasources.yml, 2-infra-monitoring/grafana/provisioning/dashboards/dashboards.yml
Grafana datasource provisioning (Prometheus, Alertmanager) and dashboard provisioning configuration.
Grafana Dashboards
2-infra-monitoring/grafana/dashboards/docker-overview.json, 2-infra-monitoring/grafana/dashboards/network-overview.json, 2-infra-monitoring/grafana/dashboards/server-health.json
Three comprehensive Grafana dashboards with multiple panels, PromQL queries, thresholds, templating, and visualizations (timeseries, stat, bargauge, table) for container metrics, network health, and server infrastructure monitoring.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

  • Grafana dashboards β€” Verify PromQL query correctness, thresholds, and transformations across three complex dashboard definitions with multiple panels and data-driven overrides.
  • Prometheus scrape configuration β€” Validate scrape job definitions, relabeling logic for blackbox probe routing, and timeout settings for different target types.
  • Alert rules β€” Review alert thresholds, time-for conditions, and annotations for accuracy and relevance across Host, Container, Service, and Network categories.
  • Alertmanager routing & inhibition β€” Confirm alert route matching, grouping behavior, and inhibit rules suppress lower-priority alerts correctly.
  • Blackbox probe modules β€” Validate probe definitions for HTTP variants, TCP/TLS checks, ICMP, DNS queries, and gRPC configurations with expected status/RCode mappings.

Poem

🐰 Hop through logs and metrics bright,
Prometheus shines with data's light,
Grafana dashboards paint the view,
Alerts when something's gone askew,
A monitoring stack, keen and keenβ€”
The finest infra you've e'er seen!

✨ Finishing touches
πŸ§ͺ Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/phase-3-infra-monitoring

πŸ“œ Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between 0ee3e4a and 27a8d6c.

πŸ“’ Files selected for processing (11)
  • 2-infra-monitoring/README.md (1 hunks)
  • 2-infra-monitoring/alertmanager/alertmanager.yml (1 hunks)
  • 2-infra-monitoring/blackbox/blackbox.yml (1 hunks)
  • 2-infra-monitoring/docker-compose.yml (1 hunks)
  • 2-infra-monitoring/grafana/dashboards/docker-overview.json (1 hunks)
  • 2-infra-monitoring/grafana/dashboards/network-overview.json (1 hunks)
  • 2-infra-monitoring/grafana/dashboards/server-health.json (1 hunks)
  • 2-infra-monitoring/grafana/provisioning/dashboards/dashboards.yml (1 hunks)
  • 2-infra-monitoring/grafana/provisioning/datasources/datasources.yml (1 hunks)
  • 2-infra-monitoring/prometheus/alerts/alerts.yml (1 hunks)
  • 2-infra-monitoring/prometheus/prometheus.yml (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sourcery-ai
Copy link

sourcery-ai bot commented Dec 4, 2025

Reviewer's Guide

Introduces a complete Docker Compose-based infrastructure monitoring stack (Prometheus, Grafana, Alertmanager, Node Exporter, Blackbox Exporter, cAdvisor) with wiring, alerting, Alertmanager routing, Blackbox modules, Grafana provisioning, and documentation, plus a comprehensive set of Prometheus alert rules and Grafana dashboards.

Sequence diagram for alert lifecycle from metrics to notifications

sequenceDiagram
  participant NodeExporter
  participant cAdvisor
  participant BlackboxExporter
  participant Prometheus
  participant Alertmanager
  participant EmailReceiver as Email_oncall
  participant WebhookReceiver as Webhook_services

  NodeExporter->>Prometheus: expose node_metrics
  cAdvisor->>Prometheus: expose_container_metrics
  BlackboxExporter->>Prometheus: expose_probe_metrics

  loop scrape_interval_15s
    Prometheus->>NodeExporter: HTTP_GET_/metrics
    Prometheus->>cAdvisor: HTTP_GET_/metrics
    Prometheus->>BlackboxExporter: HTTP_GET_/probe
    NodeExporter-->>Prometheus: metrics_payload
    cAdvisor-->>Prometheus: metrics_payload
    BlackboxExporter-->>Prometheus: probe_results
  end

  loop evaluation_interval_15s
    Prometheus->>Prometheus: evaluate_alert_rules
  end

  alt alert_fires
    Prometheus->>Alertmanager: push_alerts_via_alerting_config

    Alertmanager->>Alertmanager: route_by_severity_and_labels
    Alertmanager->>Alertmanager: apply_inhibit_rules

    alt severity_critical
      Alertmanager-->>EmailReceiver: send_critical_email
      Alertmanager-->>WebhookReceiver: POST_/webhook/critical
    else severity_warning
      Alertmanager-->>EmailReceiver: send_warning_email
      Alertmanager-->>WebhookReceiver: POST_/webhook/warning
    else severity_info
      Alertmanager-->>WebhookReceiver: POST_/webhook/info
    end
  end

  alt alert_resolved
    Prometheus->>Alertmanager: send_resolved_notification
    Alertmanager-->>EmailReceiver: send_resolved_email_if_configured
    Alertmanager-->>WebhookReceiver: POST_resolved_payload
  end
Loading

Sequence diagram for Blackbox HTTP and ICMP probing via Prometheus

sequenceDiagram
  participant Prometheus
  participant BlackboxExporter
  participant HTTPService as HTTP_targets
  participant DNSHosts as ICMP_targets

  rect rgb(235,235,235)
    Note over Prometheus,BlackboxExporter: HTTP probes via job blackbox-http
    loop scrape_interval_15s
      Prometheus->>BlackboxExporter: GET_/probe?module=http_2xx&target=http_endpoint
      BlackboxExporter->>HTTPService: HTTP_request
      HTTPService-->>BlackboxExporter: HTTP_response
      BlackboxExporter-->>Prometheus: probe_success_and_latency
    end
  end

  rect rgb(235,235,235)
    Note over Prometheus,BlackboxExporter: ICMP probes via job blackbox-icmp
    loop scrape_interval_15s
      Prometheus->>BlackboxExporter: GET_/probe?module=icmp&target=ip_address
      BlackboxExporter->>DNSHosts: ICMP_echo_request
      DNSHosts-->>BlackboxExporter: ICMP_echo_reply
      BlackboxExporter-->>Prometheus: probe_success_and_rtt
    end
  end
Loading

File-Level Changes

Change Details Files
Add Docker Compose stack wiring together monitoring services, networks, volumes, and health checks.
  • Define a dedicated monitoring bridge network and named volumes for Prometheus, Grafana, and Alertmanager data persistence.
  • Configure Prometheus, Grafana, Alertmanager, Node Exporter, Blackbox Exporter, and cAdvisor services with images, ports, volumes, commands, and restart policies.
  • Add container-level healthchecks for all core monitoring services to surface readiness in Docker Compose.
2-infra-monitoring/docker-compose.yml
Configure Prometheus to scrape core components, integrate Blackbox Exporter probes, and send alerts to Alertmanager.
  • Set global scrape and evaluation intervals and attach environment/monitor labels.
  • Wire Prometheus alerting to Alertmanager via static_configs with timeout settings.
  • Define scrape jobs for Prometheus, Node Exporter, cAdvisor, Alertmanager, Grafana, and Blackbox HTTP/ICMP/TCP probes with relabeling to route through the Blackbox Exporter.
2-infra-monitoring/prometheus/prometheus.yml
Introduce a comprehensive set of Prometheus alert rules for hosts, containers, services, network connectivity, and Prometheus self-monitoring.
  • Add CPU, memory, disk, disk I/O, and Node Exporter availability alerts with warning/critical severities.
  • Define container-level alerts for liveness, CPU/memory overuse, and restarts based on cAdvisor metrics.
  • Create service and network alerts around Prometheus/Grafana/Alertmanager availability plus Blackbox HTTP/ICMP/TCP results, and self-monitoring alerts for Prometheus configuration, rule evaluation, target scrapes, and TSDB size.
2-infra-monitoring/prometheus/alerts/alerts.yml
Set up Alertmanager routing, receivers, and inhibition rules for severity-based alert handling.
  • Configure global SMTP settings and HTTP behavior, using environment variables for sensitive values.
  • Define a root route with grouping and timing plus child routes for critical, warning, info, and infra-specific alerts.
  • Configure multiple receivers (default, critical, warning, info, infra) with combinations of email and webhook configs, and inhibition rules so higher-severity alerts suppress lower-severity ones on the same alert/instance.
2-infra-monitoring/alertmanager/alertmanager.yml
Provide Blackbox Exporter module configuration for HTTP, TCP, ICMP, DNS, and gRPC probes.
  • Define multiple HTTP modules for 2xx checks, SSL-required endpoints, POST checks, basic auth, and response-body matching.
  • Add TCP modules for generic connectivity, TLS-enabled checks, and banner/response validation for SSH and SMTP.
  • Configure ICMP modules for IPv4/IPv6, DNS modules for A and SOA lookups, and optional gRPC health check modules with and without TLS.
2-infra-monitoring/blackbox/blackbox.yml
Provision Grafana datasources and dashboard loading from files within the monitoring network.
  • Auto-provision a default Prometheus datasource with tuned query options and an Alertmanager datasource via Docker-internal URLs.
  • Document optional Loki datasource configuration for future log aggregation.
  • Configure a dashboards provider that watches a filesystem path for JSON dashboards and reloads periodically.
2-infra-monitoring/grafana/provisioning/datasources/datasources.yml
2-infra-monitoring/grafana/provisioning/dashboards/dashboards.yml
Add initial Grafana dashboards and thorough bilingual documentation for the monitoring stack.
  • Create placeholder or initial JSON dashboards for server health, Docker overview, and network overview, intended to match the documented provisioning paths.
  • Write a detailed bilingual (Hungarian/English) README describing architecture, components, startup instructions, directory layout, dashboards, alerts, configuration via .env, Alertmanager integration, maintenance, troubleshooting, and extension options.
  • Align README’s directory structure and usage examples with the Compose, Prometheus, Alertmanager, Blackbox, and Grafana configurations added in this PR.
2-infra-monitoring/grafana/dashboards/server-health.json
2-infra-monitoring/grafana/dashboards/docker-overview.json
2-infra-monitoring/grafana/dashboards/network-overview.json
2-infra-monitoring/README.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

- Add dashboard provisioning configuration (dashboards.yml)
- Add Server Health dashboard with CPU, memory, disk, network panels
- Add Docker Overview dashboard with container metrics
- Add Network Overview dashboard with HTTP, ICMP, TCP probes
- Add comprehensive bilingual README documentation

Dashboards include:
- Gauge/stat panels for quick status overview
- Time series graphs for trend analysis
- Bar gauges for comparison views
- Table summaries with multiple metrics
- Template variables for filtering

πŸ€– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@w7-mgfcode w7-mgfcode marked this pull request as ready for review December 4, 2025 13:59
@w7-mgfcode w7-mgfcode merged commit 73d4c45 into main Dec 4, 2025
4 of 12 checks passed
@w7-mgfcode w7-mgfcode deleted the feature/phase-3-infra-monitoring branch December 4, 2025 14:00
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and found some issues that need to be addressed.

  • The ContainerDown alert uses absent(container_last_seen{name!=""}), which will fire a single alert without a name label (breaking {{ $labels.name }} in the annotations) rather than per-container; consider a per-container expression like checking max_over_time(container_last_seen[...]) or a similar metric that preserves container labels.
  • All the healthchecks rely on wget, but the Prometheus/Grafana/Alertmanager/Exporter images may not have it installed; switching to curl (which some images include) or using a simpler TCP-based check (e.g. CMD-SHELL nc -z localhost 9090) would make the healthchecks more robust.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `ContainerDown` alert uses `absent(container_last_seen{name!=""})`, which will fire a single alert without a `name` label (breaking `{{ $labels.name }}` in the annotations) rather than per-container; consider a per-container expression like checking `max_over_time(container_last_seen[...])` or a similar metric that preserves container labels.
- All the healthchecks rely on `wget`, but the Prometheus/Grafana/Alertmanager/Exporter images may not have it installed; switching to `curl` (which some images include) or using a simpler TCP-based check (e.g. `CMD-SHELL nc -z localhost 9090`) would make the healthchecks more robust.

## Individual Comments

### Comment 1
<location> `2-infra-monitoring/prometheus/alerts/alerts.yml:103-23` </location>
<code_context>
+  - name: container_alerts
+    rules:
+      # Container leΓ‘llt
+      - alert: ContainerDown
+        expr: absent(container_last_seen{name!=""})
+        for: 1m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Container {{ $labels.name }} is down"
+          description: "Container has not been seen for more than 1 minute"
</code_context>

<issue_to_address>
**issue (bug_risk):** The ContainerDown alert uses `absent()` in a way that likely produces a single, unlabeled alert and makes `$labels.name` unusable.

`absent(container_last_seen{name!=""})` will fire a single alert with no `name` label, but the annotations expect `{{ $labels.name }}`. Please switch to a per-container expression (e.g. `max by (name) (time() - container_last_seen{name!=""}) > <threshold>`) so the alert preserves container labels and the template fields resolve correctly.
</issue_to_address>

### Comment 2
<location> `2-infra-monitoring/prometheus/alerts/alerts.yml:123` </location>
<code_context>
+          description: "Container CPU usage is above 80% (current: {{ $value | printf \"%.1f\" }}%)"
+
+      # Magas container memΓ³ria hasznΓ‘lat
+      - alert: ContainerHighMemory
+        expr: (container_memory_usage_bytes{name!=""} / container_spec_memory_limit_bytes{name!=""} * 100) > 80
+        for: 5m
</code_context>

<issue_to_address>
**issue (bug_risk):** The ContainerHighMemory alert can divide by zero or behave unexpectedly for containers without a memory limit.

This expression assumes `container_spec_memory_limit_bytes` is always set and > 0. For containers without a memory limit (0 or missing), this can yield divide‑by‑zero or meaningless percentages. Consider excluding unlimited containers (e.g. `> 0` on the denominator) or handling them with a separate alert.
</issue_to_address>

### Comment 3
<location> `2-infra-monitoring/prometheus/alerts/alerts.yml:268-275` </location>
<code_context>
+          description: "Rule group {{ $labels.rule_group }} is taking longer than its interval to evaluate"
+
+      # Prometheus target scrape sikertelen
+      - alert: PrometheusTargetScrapeFailed
+        expr: up == 0
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Prometheus target {{ $labels.job }} scrape failed"
+          description: "Target {{ $labels.instance }} has been down for more than 5 minutes"
+
+      # Prometheus storage magas
</code_context>

<issue_to_address>
**suggestion:** The PrometheusTargetScrapeFailed rule on `up == 0` may create noisy, overlapping alerts with more specific `*Down`/blackbox alerts.

This expression will fire for every failed scrape, including components already covered by `PrometheusDown`, `GrafanaDown`, `AlertmanagerDown` and blackbox checks. To avoid noisy, duplicate alerts, consider excluding those jobs (e.g. `up{job!~"blackbox-.*|prometheus|grafana|alertmanager"} == 0`) or limiting this rule to specific target classes.

```suggestion
      - alert: PrometheusTargetScrapeFailed
        expr: up{job!~"blackbox-.*|prometheus|grafana|alertmanager"} == 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Prometheus target {{ $labels.job }} scrape failed"
          description: "Target {{ $labels.instance }} has been down for more than 5 minutes (excluding core and blackbox-monitored jobs)"
```
</issue_to_address>

### Comment 4
<location> `2-infra-monitoring/docker-compose.yml:102-61` </location>
<code_context>
+  # =========================================
+  # ALERTMANAGER - RiasztΓ‘s kezelΓ©s
+  # =========================================
+  alertmanager:
+    image: prom/alertmanager:v0.26.0
+    container_name: alertmanager
+    restart: unless-stopped
+    ports:
+      - "9093:9093"
+    volumes:
+      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
+      - alertmanager_data:/alertmanager
+    command:
+      - "--config.file=/etc/alertmanager/alertmanager.yml"
+      - "--storage.path=/alertmanager"
+      - "--web.external-url=http://localhost:9093"
+    networks:
+      - monitoring
+    healthcheck:
+      test: ["CMD", "wget", "-q", "--spider", "http://localhost:9093/-/healthy"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+
+  # =========================================
</code_context>

<issue_to_address>
**issue (bug_risk):** Alertmanager config depends on `SMTP_PASSWORD`, but the container is not wired to receive this environment variable.

`smtp_auth_password` in `alertmanager/alertmanager.yml` uses `${SMTP_PASSWORD}`, but this variable isn’t defined in the container. Please wire it through via an `environment:` section (for example, `environment: [SMTP_PASSWORD=${SMTP_PASSWORD}]` or equivalent) so Alertmanager can resolve the password at runtime.
</issue_to_address>

### Comment 5
<location> `2-infra-monitoring/blackbox/blackbox.yml:54-21` </location>
<code_context>
+      body: '{}'
+
+  # HTTP Basic Auth
+  http_basic_auth:
+    prober: http
+    timeout: 5s
+    http:
+      method: GET
+      basic_auth:
</code_context>

<issue_to_address>
**issue (bug_risk):** The http_basic_auth module uses `${HTTP_BASIC_USER}`/`${HTTP_BASIC_PASS}` without ensuring they are provided to the blackbox-exporter container.

These values must be provided as environment variables to the `blackbox-exporter` container (for example via an `environment:` block in `docker-compose.yml` or a wrapper script). Otherwise, the exporter will use the literal strings and basic auth will fail. Please either wire the env vars through or choose a different way to parameterize these credentials.
</issue_to_address>

### Comment 6
<location> `2-infra-monitoring/docker-compose.yml:52-55` </location>
<code_context>
+      - "--web.console.libraries=/etc/prometheus/console_libraries"
+      - "--web.console.templates=/etc/prometheus/consoles"
+      - "--web.enable-lifecycle"
+      - "--web.enable-admin-api"
+    networks:
+      - monitoring
</code_context>

<issue_to_address>
**🚨 suggestion (security):** Enabling Prometheus admin API while publishing the port can introduce unnecessary security risk.

`--web.enable-admin-api` exposes destructive endpoints (e.g., delete series, snapshots). With `ports: "9090:9090"` this may be reachable outside the internal network. If there isn’t a strict access control layer in front of Prometheus, consider disabling this flag or binding Prometheus only to a private interface.

```suggestion
      - "--web.console.templates=/etc/prometheus/consoles"
      - "--web.enable-lifecycle"
    networks:
```
</issue_to_address>

### Comment 7
<location> `2-infra-monitoring/README.md:36` </location>
<code_context>
+|------------------------|------|----------------------|
+| Prometheus | 9090 | Metrika gyΕ±jtΓ©s Γ©s tΓ‘rolΓ‘s / Metric collection and storage |
+| Grafana | 3000 | VizualizΓ‘ciΓ³ Γ©s dashboardok / Visualization and dashboards |
+| Alertmanager | 9093 | RiasztΓ‘s kezelΓ©s / Alert management |
+| Node Exporter | 9100 | Host metrikΓ‘k / Host metrics |
+| Blackbox Exporter | 9115 | Endpoint monitoring (HTTP, ICMP, TCP) |
</code_context>

<issue_to_address>
**issue (typo):** Use the correct Hungarian compound noun "RiasztΓ‘skezelΓ©s" for consistency.

Earlier in the document you already use the correct compound form ("riasztΓ‘skezelΓ©st"), so please update this entry to "RiasztΓ‘skezelΓ©s / Alert management" for consistency.

```suggestion
| Alertmanager | 9093 | RiasztΓ‘skezelΓ©s / Alert management |
```
</issue_to_address>

### Comment 8
<location> `2-infra-monitoring/README.md:61` </location>
<code_context>
+# NaplΓ³k megtekintΓ©se / View logs
+docker-compose logs -f
+
+# SzolgΓ‘ltatΓ‘s Γ‘llapot / Service status
+docker-compose ps
+```
</code_context>

<issue_to_address>
**issue (typo):** Adjust the Hungarian phrase to the grammatically correct "SzolgΓ‘ltatΓ‘s Γ‘llapota".

Also consider keeping the English label alongside the Hungarian, e.g. "SzolgΓ‘ltatΓ‘s Γ‘llapota / Service status".

```suggestion
# SzolgΓ‘ltatΓ‘s Γ‘llapota / Service status
```
</issue_to_address>

### Comment 9
<location> `2-infra-monitoring/README.md:117` </location>
<code_context>
+- HTTP endpoint stΓ‘tusz Γ©s vΓ‘laszidΕ‘
+- ICMP ping latency
+- TCP port elΓ©rhetΕ‘sΓ©g
+- SSL tanΓΊsΓ­tvΓ‘ny lejΓ‘rat
+
+## RiasztΓ‘sok / Alerts
</code_context>

<issue_to_address>
**issue (typo):** Use the possessive form in Hungarian: "SSL tanΓΊsΓ­tvΓ‘ny lejΓ‘rata".

```suggestion
- SSL tanΓΊsΓ­tvΓ‘ny lejΓ‘rata
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click πŸ‘ or πŸ‘Ž on each comment and I'll use the feedback to improve your reviews.

for: 5m
labels:
severity: warning
annotations:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The ContainerDown alert uses absent() in a way that likely produces a single, unlabeled alert and makes $labels.name unusable.

absent(container_last_seen{name!=""}) will fire a single alert with no name label, but the annotations expect {{ $labels.name }}. Please switch to a per-container expression (e.g. max by (name) (time() - container_last_seen{name!=""}) > <threshold>) so the alert preserves container labels and the template fields resolve correctly.

description: "Container CPU usage is above 80% (current: {{ $value | printf \"%.1f\" }}%)"

# Magas container memΓ³ria hasznΓ‘lat
- alert: ContainerHighMemory
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The ContainerHighMemory alert can divide by zero or behave unexpectedly for containers without a memory limit.

This expression assumes container_spec_memory_limit_bytes is always set and > 0. For containers without a memory limit (0 or missing), this can yield divide‑by‑zero or meaningless percentages. Consider excluding unlimited containers (e.g. > 0 on the denominator) or handling them with a separate alert.

Comment on lines +268 to +275
- alert: PrometheusTargetScrapeFailed
expr: up == 0
for: 5m
labels:
severity: warning
annotations:
summary: "Prometheus target {{ $labels.job }} scrape failed"
description: "Target {{ $labels.instance }} has been down for more than 5 minutes"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The PrometheusTargetScrapeFailed rule on up == 0 may create noisy, overlapping alerts with more specific *Down/blackbox alerts.

This expression will fire for every failed scrape, including components already covered by PrometheusDown, GrafanaDown, AlertmanagerDown and blackbox checks. To avoid noisy, duplicate alerts, consider excluding those jobs (e.g. up{job!~"blackbox-.*|prometheus|grafana|alertmanager"} == 0) or limiting this rule to specific target classes.

Suggested change
- alert: PrometheusTargetScrapeFailed
expr: up == 0
for: 5m
labels:
severity: warning
annotations:
summary: "Prometheus target {{ $labels.job }} scrape failed"
description: "Target {{ $labels.instance }} has been down for more than 5 minutes"
- alert: PrometheusTargetScrapeFailed
expr: up{job!~"blackbox-.*|prometheus|grafana|alertmanager"} == 0
for: 5m
labels:
severity: warning
annotations:
summary: "Prometheus target {{ $labels.job }} scrape failed"
description: "Target {{ $labels.instance }} has been down for more than 5 minutes (excluding core and blackbox-monitored jobs)"

test: ["CMD", "wget", "-q", "--spider", "http://localhost:9090/-/healthy"]
interval: 30s
timeout: 10s
retries: 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Alertmanager config depends on SMTP_PASSWORD, but the container is not wired to receive this environment variable.

smtp_auth_password in alertmanager/alertmanager.yml uses ${SMTP_PASSWORD}, but this variable isn’t defined in the container. Please wire it through via an environment: section (for example, environment: [SMTP_PASSWORD=${SMTP_PASSWORD}] or equivalent) so Alertmanager can resolve the password at runtime.

http_2xx:
prober: http
timeout: 5s
http:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The http_basic_auth module uses ${HTTP_BASIC_USER}/${HTTP_BASIC_PASS} without ensuring they are provided to the blackbox-exporter container.

These values must be provided as environment variables to the blackbox-exporter container (for example via an environment: block in docker-compose.yml or a wrapper script). Otherwise, the exporter will use the literal strings and basic auth will fail. Please either wire the env vars through or choose a different way to parameterize these credentials.

Comment on lines +52 to +55
- "--web.console.templates=/etc/prometheus/consoles"
- "--web.enable-lifecycle"
- "--web.enable-admin-api"
networks:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 suggestion (security): Enabling Prometheus admin API while publishing the port can introduce unnecessary security risk.

--web.enable-admin-api exposes destructive endpoints (e.g., delete series, snapshots). With ports: "9090:9090" this may be reachable outside the internal network. If there isn’t a strict access control layer in front of Prometheus, consider disabling this flag or binding Prometheus only to a private interface.

Suggested change
- "--web.console.templates=/etc/prometheus/consoles"
- "--web.enable-lifecycle"
- "--web.enable-admin-api"
networks:
- "--web.console.templates=/etc/prometheus/consoles"
- "--web.enable-lifecycle"
networks:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants