WIP: Phase 3 - Infra Monitoring Stack #3

w7-mgfcode · 2025-12-04T12:01:44Z

🇬🇧 English Version

✅ Completed Tasks

Docker Compose stack configuration (docker-compose.yml)
- Prometheus (port 9090)
- Grafana (port 3000)
- Alertmanager (port 9093)
- Node Exporter (port 9100)
- Blackbox Exporter (port 9115)
- cAdvisor (port 8080)
- Proper networking and volumes
Prometheus configuration (prometheus.yml)
- Scrape configs for all exporters
- Alert rules reference
- Blackbox probe configurations (HTTP, ICMP, TCP)
Alert rules (alerts/alerts.yml)
- Host alerts (CPU, memory, disk)
- Container alerts (Docker)
- Service alerts (endpoints)
- Network alerts (connectivity)
- Prometheus self-monitoring alerts
Alertmanager configuration (alertmanager.yml)
- Route configuration with severity-based routing
- Receiver templates (email, webhook)
- Inhibit rules
Blackbox Exporter configuration (blackbox.yml)
- HTTP modules (2xx, SSL, POST, Basic Auth)
- TCP modules (connect, TLS, banner checks)
- ICMP modules (IPv4, IPv6)
- DNS modules (A, SOA records)
Grafana datasources provisioning

⏳ In Progress

❌ Not Started Yet

README.md for the monitoring stack
Environment variables documentation
Docker Compose validation tests

🐛 Issues Encountered

None so far

📁 Key Files Changed

2-infra-monitoring/
├── docker-compose.yml              # Main stack definition
├── prometheus/
│   ├── prometheus.yml              # Prometheus configuration
│   └── alerts/
│       └── alerts.yml              # Alert rules (40+ rules)
├── alertmanager/
│   └── alertmanager.yml            # Alert routing & receivers
├── blackbox/
│   └── blackbox.yml                # Probe modules
└── grafana/
    └── provisioning/
        └── datasources/
            └── datasources.yml     # Auto-provisioned datasources

🧪 Testing Status

Docker Compose syntax validation
Prometheus config validation (promtool check config)
Alertmanager config validation (amtool check-config)
Stack deployment test

🇭🇺 Magyar Verzió

✅ Elkészült Feladatok

Docker Compose stack konfiguráció (docker-compose.yml)
- Prometheus (9090-es port)
- Grafana (3000-es port)
- Alertmanager (9093-as port)
- Node Exporter (9100-as port)
- Blackbox Exporter (9115-ös port)
- cAdvisor (8080-as port)
- Megfelelő hálózat és volume-ok
Prometheus konfiguráció (prometheus.yml)
- Scrape config minden exporter-hez
- Alert szabályok hivatkozás
- Blackbox probe konfigurációk (HTTP, ICMP, TCP)
Riasztási szabályok (alerts/alerts.yml)
- Host riasztások (CPU, memória, lemez)
- Container riasztások (Docker)
- Szolgáltatás riasztások (endpoint-ok)
- Hálózati riasztások (elérhetőség)
- Prometheus önellenőrzés riasztások
Alertmanager konfiguráció (alertmanager.yml)
- Útvonal konfiguráció súlyosság alapú irányítással
- Receiver sablonok (email, webhook)
- Inhibit szabályok
Blackbox Exporter konfiguráció (blackbox.yml)
- HTTP modulok (2xx, SSL, POST, Basic Auth)
- TCP modulok (connect, TLS, banner ellenőrzés)
- ICMP modulok (IPv4, IPv6)
- DNS modulok (A, SOA rekordok)
Grafana adatforrások auto-provisioning

⏳ Folyamatban

Grafana dashboardok provisioning konfiguráció
Előre elkészített dashboard JSON fájlok:
- network-overview.json
- server-health.json
- docker-overview.json

❌ Még Nem Kezdődött El

README.md a monitoring stack-hez
Környezeti változók dokumentáció
Docker Compose validációs tesztek

🐛 Felmerült Problémák

Eddig nem volt

📁 Főbb Módosított Fájlok

2-infra-monitoring/
├── docker-compose.yml              # Fő stack definíció
├── prometheus/
│   ├── prometheus.yml              # Prometheus konfiguráció
│   └── alerts/
│       └── alerts.yml              # Riasztási szabályok (40+ szabály)
├── alertmanager/
│   └── alertmanager.yml            # Riasztás irányítás & fogadók
├── blackbox/
│   └── blackbox.yml                # Probe modulok
└── grafana/
    └── provisioning/
        └── datasources/
            └── datasources.yml     # Auto-provisionált adatforrások

🧪 Tesztelés Állapota

Docker Compose szintaxis validáció
Prometheus config validáció (promtool check config)
Alertmanager config validáció (amtool check-config)
Stack deployment teszt

🤖 Generated with Claude Code

Summary by Sourcery

Add a complete Docker Compose–based infrastructure monitoring stack with Prometheus, Alertmanager, Grafana, exporters, and provisioning.

New Features:

Introduce a Docker Compose stack that deploys Prometheus, Grafana, Alertmanager, Node Exporter, Blackbox Exporter, and cAdvisor on a dedicated monitoring network.
Add Prometheus configuration for scraping core monitoring components and running alerting rules against exporters and blackbox probes.
Define Alertmanager routing, receivers, and inhibition rules for severity-based alert handling with email and webhook integrations.
Provide Blackbox Exporter modules for HTTP, TCP, ICMP, DNS, and gRPC endpoint probing.
Provision Grafana datasources for Prometheus and Alertmanager and configure filesystem-based dashboard provisioning with bundled dashboards for server, Docker, and network overviews.

Documentation:

Add a bilingual README describing the monitoring stack architecture, usage, configuration, alert catalogue, maintenance, and extension options.

Summary by CodeRabbit

Release Notes

New Features
- Introduced complete Infra Monitoring Stack with Prometheus, Grafana, and Alertmanager
- Added pre-built dashboards for container, network, and server health monitoring
- Configured automatic alert rules for infrastructure components
- Enabled multi-protocol endpoint monitoring (HTTP, TCP, ICMP, DNS, gRPC)
- Included Docker Compose setup for streamlined deployment

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Docker Compose with Prometheus, Grafana, Alertmanager, Node Exporter, Blackbox Exporter, cAdvisor - Prometheus configuration with scrape configs - Alert rules for host, container, service, and network monitoring - Alertmanager configuration with routing and receivers - Blackbox Exporter modules for HTTP, TCP, ICMP, DNS probes - Grafana datasources provisioning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2025-12-04T12:01:50Z

Caution

Review failed

The pull request is closed.

Walkthrough

Introduces a complete Docker Compose-based infrastructure monitoring stack comprising Prometheus for metrics collection, Grafana with three monitoring dashboards, Alertmanager for alert routing, and exporters (Node, Blackbox, cadvisor) for multi-protocol endpoint and system monitoring. Includes configuration files for all components and comprehensive README documentation.

Changes

Cohort / File(s)	Summary
Documentation & Setup `2-infra-monitoring/README.md`	Comprehensive bilingual documentation covering architecture, component overview, quick start instructions, configuration guidance for all stack components, troubleshooting, and extension options.
Core Infrastructure `2-infra-monitoring/docker-compose.yml`	Docker Compose configuration defining seven services (Prometheus, Grafana, Alertmanager, Node Exporter, Blackbox Exporter, cadvisor) with networking, volume provisioning, healthchecks, and environment-based credentials.
Prometheus Configuration `2-infra-monitoring/prometheus/prometheus.yml`, `2-infra-monitoring/prometheus/alerts/alerts.yml`	Prometheus configuration with global settings, multiple scrape jobs (node-exporter, cadvisor, alertmanager, blackbox probes), and alert rules organized by Host, Container, Service, Network, and Prometheus self-monitoring categories.
Alertmanager & Exporters `2-infra-monitoring/alertmanager/alertmanager.yml`, `2-infra-monitoring/blackbox/blackbox.yml`	Alertmanager routing, grouping, and notification receivers (email, webhook) with inhibit rules; Blackbox Exporter probe modules for HTTP, TCP, ICMP, DNS, and gRPC with configurable options.
Grafana Provisioning `2-infra-monitoring/grafana/provisioning/datasources/datasources.yml`, `2-infra-monitoring/grafana/provisioning/dashboards/dashboards.yml`	Grafana datasource provisioning (Prometheus, Alertmanager) and dashboard provisioning configuration.
Grafana Dashboards `2-infra-monitoring/grafana/dashboards/docker-overview.json`, `2-infra-monitoring/grafana/dashboards/network-overview.json`, `2-infra-monitoring/grafana/dashboards/server-health.json`	Three comprehensive Grafana dashboards with multiple panels, PromQL queries, thresholds, templating, and visualizations (timeseries, stat, bargauge, table) for container metrics, network health, and server infrastructure monitoring.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Grafana dashboards — Verify PromQL query correctness, thresholds, and transformations across three complex dashboard definitions with multiple panels and data-driven overrides.
Prometheus scrape configuration — Validate scrape job definitions, relabeling logic for blackbox probe routing, and timeout settings for different target types.
Alert rules — Review alert thresholds, time-for conditions, and annotations for accuracy and relevance across Host, Container, Service, and Network categories.
Alertmanager routing & inhibition — Confirm alert route matching, grouping behavior, and inhibit rules suppress lower-priority alerts correctly.
Blackbox probe modules — Validate probe definitions for HTTP variants, TCP/TLS checks, ICMP, DNS queries, and gRPC configurations with expected status/RCode mappings.

Poem

🐰 Hop through logs and metrics bright,
Prometheus shines with data's light,
Grafana dashboards paint the view,
Alerts when something's gone askew,
A monitoring stack, keen and keen—
The finest infra you've e'er seen!

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/phase-3-infra-monitoring

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ee3e4a and 27a8d6c.

📒 Files selected for processing (11)

2-infra-monitoring/README.md (1 hunks)
2-infra-monitoring/alertmanager/alertmanager.yml (1 hunks)
2-infra-monitoring/blackbox/blackbox.yml (1 hunks)
2-infra-monitoring/docker-compose.yml (1 hunks)
2-infra-monitoring/grafana/dashboards/docker-overview.json (1 hunks)
2-infra-monitoring/grafana/dashboards/network-overview.json (1 hunks)
2-infra-monitoring/grafana/dashboards/server-health.json (1 hunks)
2-infra-monitoring/grafana/provisioning/dashboards/dashboards.yml (1 hunks)
2-infra-monitoring/grafana/provisioning/datasources/datasources.yml (1 hunks)
2-infra-monitoring/prometheus/alerts/alerts.yml (1 hunks)
2-infra-monitoring/prometheus/prometheus.yml (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai · 2025-12-04T12:01:50Z

Reviewer's Guide

Introduces a complete Docker Compose-based infrastructure monitoring stack (Prometheus, Grafana, Alertmanager, Node Exporter, Blackbox Exporter, cAdvisor) with wiring, alerting, Alertmanager routing, Blackbox modules, Grafana provisioning, and documentation, plus a comprehensive set of Prometheus alert rules and Grafana dashboards.

Sequence diagram for alert lifecycle from metrics to notifications

sequenceDiagram
  participant NodeExporter
  participant cAdvisor
  participant BlackboxExporter
  participant Prometheus
  participant Alertmanager
  participant EmailReceiver as Email_oncall
  participant WebhookReceiver as Webhook_services

  NodeExporter->>Prometheus: expose node_metrics
  cAdvisor->>Prometheus: expose_container_metrics
  BlackboxExporter->>Prometheus: expose_probe_metrics

  loop scrape_interval_15s
    Prometheus->>NodeExporter: HTTP_GET_/metrics
    Prometheus->>cAdvisor: HTTP_GET_/metrics
    Prometheus->>BlackboxExporter: HTTP_GET_/probe
    NodeExporter-->>Prometheus: metrics_payload
    cAdvisor-->>Prometheus: metrics_payload
    BlackboxExporter-->>Prometheus: probe_results
  end

  loop evaluation_interval_15s
    Prometheus->>Prometheus: evaluate_alert_rules
  end

  alt alert_fires
    Prometheus->>Alertmanager: push_alerts_via_alerting_config

    Alertmanager->>Alertmanager: route_by_severity_and_labels
    Alertmanager->>Alertmanager: apply_inhibit_rules

    alt severity_critical
      Alertmanager-->>EmailReceiver: send_critical_email
      Alertmanager-->>WebhookReceiver: POST_/webhook/critical
    else severity_warning
      Alertmanager-->>EmailReceiver: send_warning_email
      Alertmanager-->>WebhookReceiver: POST_/webhook/warning
    else severity_info
      Alertmanager-->>WebhookReceiver: POST_/webhook/info
    end
  end

  alt alert_resolved
    Prometheus->>Alertmanager: send_resolved_notification
    Alertmanager-->>EmailReceiver: send_resolved_email_if_configured
    Alertmanager-->>WebhookReceiver: POST_resolved_payload
  end

Sequence diagram for Blackbox HTTP and ICMP probing via Prometheus

sequenceDiagram
  participant Prometheus
  participant BlackboxExporter
  participant HTTPService as HTTP_targets
  participant DNSHosts as ICMP_targets

  rect rgb(235,235,235)
    Note over Prometheus,BlackboxExporter: HTTP probes via job blackbox-http
    loop scrape_interval_15s
      Prometheus->>BlackboxExporter: GET_/probe?module=http_2xx&target=http_endpoint
      BlackboxExporter->>HTTPService: HTTP_request
      HTTPService-->>BlackboxExporter: HTTP_response
      BlackboxExporter-->>Prometheus: probe_success_and_latency
    end
  end

  rect rgb(235,235,235)
    Note over Prometheus,BlackboxExporter: ICMP probes via job blackbox-icmp
    loop scrape_interval_15s
      Prometheus->>BlackboxExporter: GET_/probe?module=icmp&target=ip_address
      BlackboxExporter->>DNSHosts: ICMP_echo_request
      DNSHosts-->>BlackboxExporter: ICMP_echo_reply
      BlackboxExporter-->>Prometheus: probe_success_and_rtt
    end
  end

File-Level Changes

Change	Details	Files
Add Docker Compose stack wiring together monitoring services, networks, volumes, and health checks.	Define a dedicated monitoring bridge network and named volumes for Prometheus, Grafana, and Alertmanager data persistence. Configure Prometheus, Grafana, Alertmanager, Node Exporter, Blackbox Exporter, and cAdvisor services with images, ports, volumes, commands, and restart policies. Add container-level healthchecks for all core monitoring services to surface readiness in Docker Compose.	`2-infra-monitoring/docker-compose.yml`
Configure Prometheus to scrape core components, integrate Blackbox Exporter probes, and send alerts to Alertmanager.	Set global scrape and evaluation intervals and attach environment/monitor labels. Wire Prometheus alerting to Alertmanager via static_configs with timeout settings. Define scrape jobs for Prometheus, Node Exporter, cAdvisor, Alertmanager, Grafana, and Blackbox HTTP/ICMP/TCP probes with relabeling to route through the Blackbox Exporter.	`2-infra-monitoring/prometheus/prometheus.yml`
Introduce a comprehensive set of Prometheus alert rules for hosts, containers, services, network connectivity, and Prometheus self-monitoring.	Add CPU, memory, disk, disk I/O, and Node Exporter availability alerts with warning/critical severities. Define container-level alerts for liveness, CPU/memory overuse, and restarts based on cAdvisor metrics. Create service and network alerts around Prometheus/Grafana/Alertmanager availability plus Blackbox HTTP/ICMP/TCP results, and self-monitoring alerts for Prometheus configuration, rule evaluation, target scrapes, and TSDB size.	`2-infra-monitoring/prometheus/alerts/alerts.yml`
Set up Alertmanager routing, receivers, and inhibition rules for severity-based alert handling.	Configure global SMTP settings and HTTP behavior, using environment variables for sensitive values. Define a root route with grouping and timing plus child routes for critical, warning, info, and infra-specific alerts. Configure multiple receivers (default, critical, warning, info, infra) with combinations of email and webhook configs, and inhibition rules so higher-severity alerts suppress lower-severity ones on the same alert/instance.	`2-infra-monitoring/alertmanager/alertmanager.yml`
Provide Blackbox Exporter module configuration for HTTP, TCP, ICMP, DNS, and gRPC probes.	Define multiple HTTP modules for 2xx checks, SSL-required endpoints, POST checks, basic auth, and response-body matching. Add TCP modules for generic connectivity, TLS-enabled checks, and banner/response validation for SSH and SMTP. Configure ICMP modules for IPv4/IPv6, DNS modules for A and SOA lookups, and optional gRPC health check modules with and without TLS.	`2-infra-monitoring/blackbox/blackbox.yml`
Provision Grafana datasources and dashboard loading from files within the monitoring network.	Auto-provision a default Prometheus datasource with tuned query options and an Alertmanager datasource via Docker-internal URLs. Document optional Loki datasource configuration for future log aggregation. Configure a dashboards provider that watches a filesystem path for JSON dashboards and reloads periodically.	`2-infra-monitoring/grafana/provisioning/datasources/datasources.yml` `2-infra-monitoring/grafana/provisioning/dashboards/dashboards.yml`
Add initial Grafana dashboards and thorough bilingual documentation for the monitoring stack.	Create placeholder or initial JSON dashboards for server health, Docker overview, and network overview, intended to match the documented provisioning paths. Write a detailed bilingual (Hungarian/English) README describing architecture, components, startup instructions, directory layout, dashboards, alerts, configuration via .env, Alertmanager integration, maintenance, troubleshooting, and extension options. Align README’s directory structure and usage examples with the Compose, Prometheus, Alertmanager, Blackbox, and Grafana configurations added in this PR.	`2-infra-monitoring/grafana/dashboards/server-health.json` `2-infra-monitoring/grafana/dashboards/docker-overview.json` `2-infra-monitoring/grafana/dashboards/network-overview.json` `2-infra-monitoring/README.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

- Add dashboard provisioning configuration (dashboards.yml) - Add Server Health dashboard with CPU, memory, disk, network panels - Add Docker Overview dashboard with container metrics - Add Network Overview dashboard with HTTP, ICMP, TCP probes - Add comprehensive bilingual README documentation Dashboards include: - Gauge/stat panels for quick status overview - Time series graphs for trend analysis - Bar gauges for comparison views - Table summaries with multiple metrics - Template variables for filtering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

sourcery-ai

Hey there - I've reviewed your changes and found some issues that need to be addressed.

The ContainerDown alert uses absent(container_last_seen{name!=""}), which will fire a single alert without a name label (breaking {{ $labels.name }} in the annotations) rather than per-container; consider a per-container expression like checking max_over_time(container_last_seen[...]) or a similar metric that preserves container labels.
All the healthchecks rely on wget, but the Prometheus/Grafana/Alertmanager/Exporter images may not have it installed; switching to curl (which some images include) or using a simpler TCP-based check (e.g. CMD-SHELL nc -z localhost 9090) would make the healthchecks more robust.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `ContainerDown` alert uses `absent(container_last_seen{name!=""})`, which will fire a single alert without a `name` label (breaking `{{ $labels.name }}` in the annotations) rather than per-container; consider a per-container expression like checking `max_over_time(container_last_seen[...])` or a similar metric that preserves container labels.
- All the healthchecks rely on `wget`, but the Prometheus/Grafana/Alertmanager/Exporter images may not have it installed; switching to `curl` (which some images include) or using a simpler TCP-based check (e.g. `CMD-SHELL nc -z localhost 9090`) would make the healthchecks more robust.

## Individual Comments

### Comment 1
<location> `2-infra-monitoring/prometheus/alerts/alerts.yml:103-23` </location>
<code_context>
+  - name: container_alerts
+    rules:
+      # Container leállt
+      - alert: ContainerDown
+        expr: absent(container_last_seen{name!=""})
+        for: 1m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Container {{ $labels.name }} is down"
+          description: "Container has not been seen for more than 1 minute"
</code_context>

<issue_to_address>
**issue (bug_risk):** The ContainerDown alert uses `absent()` in a way that likely produces a single, unlabeled alert and makes `$labels.name` unusable.

`absent(container_last_seen{name!=""})` will fire a single alert with no `name` label, but the annotations expect `{{ $labels.name }}`. Please switch to a per-container expression (e.g. `max by (name) (time() - container_last_seen{name!=""}) > <threshold>`) so the alert preserves container labels and the template fields resolve correctly.
</issue_to_address>

### Comment 2
<location> `2-infra-monitoring/prometheus/alerts/alerts.yml:123` </location>
<code_context>
+          description: "Container CPU usage is above 80% (current: {{ $value | printf \"%.1f\" }}%)"
+
+      # Magas container memória használat
+      - alert: ContainerHighMemory
+        expr: (container_memory_usage_bytes{name!=""} / container_spec_memory_limit_bytes{name!=""} * 100) > 80
+        for: 5m
</code_context>

<issue_to_address>
**issue (bug_risk):** The ContainerHighMemory alert can divide by zero or behave unexpectedly for containers without a memory limit.

This expression assumes `container_spec_memory_limit_bytes` is always set and > 0. For containers without a memory limit (0 or missing), this can yield divide‑by‑zero or meaningless percentages. Consider excluding unlimited containers (e.g. `> 0` on the denominator) or handling them with a separate alert.
</issue_to_address>

### Comment 3
<location> `2-infra-monitoring/prometheus/alerts/alerts.yml:268-275` </location>
<code_context>
+          description: "Rule group {{ $labels.rule_group }} is taking longer than its interval to evaluate"
+
+      # Prometheus target scrape sikertelen
+      - alert: PrometheusTargetScrapeFailed
+        expr: up == 0
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Prometheus target {{ $labels.job }} scrape failed"
+          description: "Target {{ $labels.instance }} has been down for more than 5 minutes"
+
+      # Prometheus storage magas
</code_context>

<issue_to_address>
**suggestion:** The PrometheusTargetScrapeFailed rule on `up == 0` may create noisy, overlapping alerts with more specific `*Down`/blackbox alerts.

This expression will fire for every failed scrape, including components already covered by `PrometheusDown`, `GrafanaDown`, `AlertmanagerDown` and blackbox checks. To avoid noisy, duplicate alerts, consider excluding those jobs (e.g. `up{job!~"blackbox-.*|prometheus|grafana|alertmanager"} == 0`) or limiting this rule to specific target classes.

```suggestion
      - alert: PrometheusTargetScrapeFailed
        expr: up{job!~"blackbox-.*|prometheus|grafana|alertmanager"} == 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Prometheus target {{ $labels.job }} scrape failed"
          description: "Target {{ $labels.instance }} has been down for more than 5 minutes (excluding core and blackbox-monitored jobs)"
```
</issue_to_address>

### Comment 4
<location> `2-infra-monitoring/docker-compose.yml:102-61` </location>
<code_context>
+  # =========================================
+  # ALERTMANAGER - Riasztás kezelés
+  # =========================================
+  alertmanager:
+    image: prom/alertmanager:v0.26.0
+    container_name: alertmanager
+    restart: unless-stopped
+    ports:
+      - "9093:9093"
+    volumes:
+      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
+      - alertmanager_data:/alertmanager
+    command:
+      - "--config.file=/etc/alertmanager/alertmanager.yml"
+      - "--storage.path=/alertmanager"
+      - "--web.external-url=http://localhost:9093"
+    networks:
+      - monitoring
+    healthcheck:
+      test: ["CMD", "wget", "-q", "--spider", "http://localhost:9093/-/healthy"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+
+  # =========================================
</code_context>

<issue_to_address>
**issue (bug_risk):** Alertmanager config depends on `SMTP_PASSWORD`, but the container is not wired to receive this environment variable.

`smtp_auth_password` in `alertmanager/alertmanager.yml` uses `${SMTP_PASSWORD}`, but this variable isn’t defined in the container. Please wire it through via an `environment:` section (for example, `environment: [SMTP_PASSWORD=${SMTP_PASSWORD}]` or equivalent) so Alertmanager can resolve the password at runtime.
</issue_to_address>

### Comment 5
<location> `2-infra-monitoring/blackbox/blackbox.yml:54-21` </location>
<code_context>
+      body: '{}'
+
+  # HTTP Basic Auth
+  http_basic_auth:
+    prober: http
+    timeout: 5s
+    http:
+      method: GET
+      basic_auth:
</code_context>

<issue_to_address>
**issue (bug_risk):** The http_basic_auth module uses `${HTTP_BASIC_USER}`/`${HTTP_BASIC_PASS}` without ensuring they are provided to the blackbox-exporter container.

These values must be provided as environment variables to the `blackbox-exporter` container (for example via an `environment:` block in `docker-compose.yml` or a wrapper script). Otherwise, the exporter will use the literal strings and basic auth will fail. Please either wire the env vars through or choose a different way to parameterize these credentials.
</issue_to_address>

### Comment 6
<location> `2-infra-monitoring/docker-compose.yml:52-55` </location>
<code_context>
+      - "--web.console.libraries=/etc/prometheus/console_libraries"
+      - "--web.console.templates=/etc/prometheus/consoles"
+      - "--web.enable-lifecycle"
+      - "--web.enable-admin-api"
+    networks:
+      - monitoring
</code_context>

<issue_to_address>
**🚨 suggestion (security):** Enabling Prometheus admin API while publishing the port can introduce unnecessary security risk.

`--web.enable-admin-api` exposes destructive endpoints (e.g., delete series, snapshots). With `ports: "9090:9090"` this may be reachable outside the internal network. If there isn’t a strict access control layer in front of Prometheus, consider disabling this flag or binding Prometheus only to a private interface.

```suggestion
      - "--web.console.templates=/etc/prometheus/consoles"
      - "--web.enable-lifecycle"
    networks:
```
</issue_to_address>

### Comment 7
<location> `2-infra-monitoring/README.md:36` </location>
<code_context>
+|------------------------|------|----------------------|
+| Prometheus | 9090 | Metrika gyűjtés és tárolás / Metric collection and storage |
+| Grafana | 3000 | Vizualizáció és dashboardok / Visualization and dashboards |
+| Alertmanager | 9093 | Riasztás kezelés / Alert management |
+| Node Exporter | 9100 | Host metrikák / Host metrics |
+| Blackbox Exporter | 9115 | Endpoint monitoring (HTTP, ICMP, TCP) |
</code_context>

<issue_to_address>
**issue (typo):** Use the correct Hungarian compound noun "Riasztáskezelés" for consistency.

Earlier in the document you already use the correct compound form ("riasztáskezelést"), so please update this entry to "Riasztáskezelés / Alert management" for consistency.

```suggestion
| Alertmanager | 9093 | Riasztáskezelés / Alert management |
```
</issue_to_address>

### Comment 8
<location> `2-infra-monitoring/README.md:61` </location>
<code_context>
+# Naplók megtekintése / View logs
+docker-compose logs -f
+
+# Szolgáltatás állapot / Service status
+docker-compose ps
+```
</code_context>

<issue_to_address>
**issue (typo):** Adjust the Hungarian phrase to the grammatically correct "Szolgáltatás állapota".

Also consider keeping the English label alongside the Hungarian, e.g. "Szolgáltatás állapota / Service status".

```suggestion
# Szolgáltatás állapota / Service status
```
</issue_to_address>

### Comment 9
<location> `2-infra-monitoring/README.md:117` </location>
<code_context>
+- HTTP endpoint státusz és válaszidő
+- ICMP ping latency
+- TCP port elérhetőség
+- SSL tanúsítvány lejárat
+
+## Riasztások / Alerts
</code_context>

<issue_to_address>
**issue (typo):** Use the possessive form in Hungarian: "SSL tanúsítvány lejárata".

```suggestion
- SSL tanúsítvány lejárata
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-12-04T14:01:04Z

2-infra-monitoring/prometheus/alerts/alerts.yml

+        for: 5m
+        labels:
+          severity: warning
+        annotations:


issue (bug_risk): The ContainerDown alert uses absent() in a way that likely produces a single, unlabeled alert and makes $labels.name unusable.

absent(container_last_seen{name!=""}) will fire a single alert with no name label, but the annotations expect {{ $labels.name }}. Please switch to a per-container expression (e.g. max by (name) (time() - container_last_seen{name!=""}) > <threshold>) so the alert preserves container labels and the template fields resolve correctly.

sourcery-ai · 2025-12-04T14:01:04Z

2-infra-monitoring/prometheus/alerts/alerts.yml

+          description: "Container CPU usage is above 80% (current: {{ $value | printf \"%.1f\" }}%)"
+
+      # Magas container memória használat
+      - alert: ContainerHighMemory


issue (bug_risk): The ContainerHighMemory alert can divide by zero or behave unexpectedly for containers without a memory limit.

This expression assumes container_spec_memory_limit_bytes is always set and > 0. For containers without a memory limit (0 or missing), this can yield divide‑by‑zero or meaningless percentages. Consider excluding unlimited containers (e.g. > 0 on the denominator) or handling them with a separate alert.

sourcery-ai · 2025-12-04T14:01:04Z

2-infra-monitoring/prometheus/alerts/alerts.yml

+      - alert: PrometheusTargetScrapeFailed
+        expr: up == 0
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "Prometheus target {{ $labels.job }} scrape failed"
+          description: "Target {{ $labels.instance }} has been down for more than 5 minutes"


suggestion: The PrometheusTargetScrapeFailed rule on up == 0 may create noisy, overlapping alerts with more specific *Down/blackbox alerts.

This expression will fire for every failed scrape, including components already covered by PrometheusDown, GrafanaDown, AlertmanagerDown and blackbox checks. To avoid noisy, duplicate alerts, consider excluding those jobs (e.g. up{job!~"blackbox-.*|prometheus|grafana|alertmanager"} == 0) or limiting this rule to specific target classes.

Suggested change

- alert: PrometheusTargetScrapeFailed

expr: up == 0

for: 5m

labels:

severity: warning

annotations:

summary: "Prometheus target {{ $labels.job }} scrape failed"

description: "Target {{ $labels.instance }} has been down for more than 5 minutes"

- alert: PrometheusTargetScrapeFailed

expr: up{job!~"blackbox-.*|prometheus|grafana|alertmanager"} == 0

for: 5m

labels:

severity: warning

annotations:

summary: "Prometheus target {{ $labels.job }} scrape failed"

description: "Target {{ $labels.instance }} has been down for more than 5 minutes (excluding core and blackbox-monitored jobs)"

sourcery-ai · 2025-12-04T14:01:04Z

2-infra-monitoring/docker-compose.yml

+      test: ["CMD", "wget", "-q", "--spider", "http://localhost:9090/-/healthy"]
+      interval: 30s
+      timeout: 10s
+      retries: 3


issue (bug_risk): Alertmanager config depends on SMTP_PASSWORD, but the container is not wired to receive this environment variable.

smtp_auth_password in alertmanager/alertmanager.yml uses ${SMTP_PASSWORD}, but this variable isn’t defined in the container. Please wire it through via an environment: section (for example, environment: [SMTP_PASSWORD=${SMTP_PASSWORD}] or equivalent) so Alertmanager can resolve the password at runtime.

sourcery-ai · 2025-12-04T14:01:04Z

2-infra-monitoring/blackbox/blackbox.yml

+  http_2xx:
+    prober: http
+    timeout: 5s
+    http:


issue (bug_risk): The http_basic_auth module uses ${HTTP_BASIC_USER}/${HTTP_BASIC_PASS} without ensuring they are provided to the blackbox-exporter container.

These values must be provided as environment variables to the blackbox-exporter container (for example via an environment: block in docker-compose.yml or a wrapper script). Otherwise, the exporter will use the literal strings and basic auth will fail. Please either wire the env vars through or choose a different way to parameterize these credentials.

sourcery-ai · 2025-12-04T14:01:04Z

2-infra-monitoring/docker-compose.yml

+      - "--web.console.templates=/etc/prometheus/consoles"
+      - "--web.enable-lifecycle"
+      - "--web.enable-admin-api"
+    networks:


🚨 suggestion (security): Enabling Prometheus admin API while publishing the port can introduce unnecessary security risk.

--web.enable-admin-api exposes destructive endpoints (e.g., delete series, snapshots). With ports: "9090:9090" this may be reachable outside the internal network. If there isn’t a strict access control layer in front of Prometheus, consider disabling this flag or binding Prometheus only to a private interface.

Suggested change

- "--web.console.templates=/etc/prometheus/consoles"

- "--web.enable-lifecycle"

- "--web.enable-admin-api"

networks:

- "--web.console.templates=/etc/prometheus/consoles"

- "--web.enable-lifecycle"

networks:

w7-mgfcode marked this pull request as ready for review December 4, 2025 13:59

w7-mgfcode merged commit 73d4c45 into main Dec 4, 2025
4 of 12 checks passed

w7-mgfcode deleted the feature/phase-3-infra-monitoring branch December 4, 2025 14:00

sourcery-ai bot reviewed Dec 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Phase 3 - Infra Monitoring Stack #3

WIP: Phase 3 - Infra Monitoring Stack #3

Uh oh!

w7-mgfcode commented Dec 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 4, 2025 •

edited

Loading

Review failed

Uh oh!

sourcery-ai bot commented Dec 4, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Dec 4, 2025

Uh oh!

sourcery-ai bot Dec 4, 2025

Uh oh!

sourcery-ai bot Dec 4, 2025

Uh oh!

sourcery-ai bot Dec 4, 2025

Uh oh!

sourcery-ai bot Dec 4, 2025

Uh oh!

sourcery-ai bot Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WIP: Phase 3 - Infra Monitoring Stack #3

WIP: Phase 3 - Infra Monitoring Stack #3

Uh oh!

Conversation

w7-mgfcode commented Dec 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🇬🇧 English Version

✅ Completed Tasks

⏳ In Progress

❌ Not Started Yet

🐛 Issues Encountered

📁 Key Files Changed

🧪 Testing Status

🇭🇺 Magyar Verzió

✅ Elkészült Feladatok

⏳ Folyamatban

❌ Még Nem Kezdődött El

🐛 Felmerült Problémák

📁 Főbb Módosított Fájlok

🧪 Tesztelés Állapota

Summary by Sourcery

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

sourcery-ai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for alert lifecycle from metrics to notifications

Sequence diagram for Blackbox HTTP and ICMP probing via Prometheus

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

w7-mgfcode commented Dec 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 4, 2025 •

edited

Loading

sourcery-ai bot commented Dec 4, 2025 •

edited

Loading