Real-Time System Monitoring

A full-stack data engineering and AI-driven observability project that monitors live system metrics (CPU, memory, etc.), streams them through Apache Kafka, processes them in Apache Flink (PyFlink) for anomaly detection, and stores results in TimescaleDB (PostgreSQL) for real-time visualization in Grafana dashboards.

Overview

Modern distributed systems generate massive amounts of telemetry data every second.
This project builds a real-time monitoring pipeline that can:

Stream system metrics in real time.
Detect anomalies using adaptive statistical models (EWMA + 3σ).
Store raw and processed data efficiently.
Visualize insights dynamically in Grafana.
Extend to AI-based forecasting (ARIMA / LSTM).

It demonstrates hands-on skills in streaming data pipelines, distributed systems, AI for observability, and real-time analytics—key areas for cloud and SRE roles at Amazon, Google, or Microsoft.

Tech Stack

Layer	Technology	Purpose
Ingestion	Apache Kafka	Message broker for streaming metrics
Coordination	Zookeeper	Kafka cluster coordination
Schema Management	Confluent Schema Registry (Avro)	Enforce message consistency
Processing	Apache PyFlink	Real-time anomaly detection
Storage	TimescaleDB (PostgreSQL)	Time-series database for metrics
Visualization	Grafana	Dashboard for real-time monitoring
Alerting (future)	Slack / PagerDuty	Incident alerts for detected anomalies
Containerization	Docker & Docker Compose	Local multi-service orchestration

System Architecture

┌─────────────┐ ┌────────────┐ ┌───────────────┐ ┌───────────────┐ ┌──────────────┐
│ Metrics     │ →→→ │ Kafka      │ →→→ │ PyFlink Stream │ →→→ │ TimescaleDB   │ →→→ │ Grafana   │
│ Producer(s) │     │ (Broker)   │     │ Processing     │     │ (Postgres)    │     │ Dashboard │
└─────────────┘ └────────────┘ └───────────────┘ └───────────────┘ └──────────────┘

Flow summary:

Python producers generate and publish live CPU/memory metrics to Kafka topics.
PyFlink consumes the stream, applies an EWMA (Exponentially Weighted Moving Average) anomaly detector using the 3σ rule.
Anomalies and metrics are inserted into TimescaleDB.
Grafana visualizes real-time metrics and anomaly scores.

AI & Anomaly Detection

EWMA (Exponentially Weighted Moving Average) Model

ewma_new  = ALPHA * value + (1 - ALPHA) * ewma
ewmsq_new = ALPHA * (value**2) + (1 - ALPHA) * ewmsq
variance  = max(ewmsq_new - ewma_new**2, 0)
std_dev   = sqrt(variance)
score     = abs(value - ewma_new) / std_dev   # 3σ rule

Each host maintains its own EWMA state. A score ≥ 3 is treated as an anomaly. Future extensions: integrate ARIMA / LSTM for predictive forecasting.

Repository Structure

Real-Time-System-Monitoring-with-AI-Prediction/ │ ├── docker-compose.yml # Multi-container setup (Kafka, Zookeeper, Postgres, Schema Registry, Grafana) ├── metrics_producer.py # Streams random CPU & memory metrics to Kafka ├── db_consumer.py # Consumes from Kafka and inserts into TimescaleDB ├── flink_job.py (or anomaly_flink.py) │ # PyFlink anomaly detection pipeline ├── sql/ │ └── init.sql # Creates metrics_raw and metrics_anomalies tables ├── requirements.txt # Python dependencies └── README.md # Project documentation

Quick Start

docker compose up -d
python metrics_producer.py
python db_consumer.py
python flink_job.py

Setup & Installation

1️⃣ Prerequisites Install the following tools:

Docker & Docker Compose
Python 3.10+
pip install -r requirements.txt

2️⃣ Launch Docker Services

docker compose up -d

This spins up:

Zookeeper (port 2181)
Kafka (port 9092)
Schema Registry (8081)
Postgres/TimescaleDB (5432)
Grafana (3000)
Kafdrop UI (19000)

Check container status:

docker compose ps

3️⃣ Verify Kafka Topics

docker exec -it rtm-ai-kafka-1 \
  kafka-topics --bootstrap-server kafka:29092 --list

If not present, create:

docker exec -it rtm-ai-kafka-1 \
  kafka-topics --bootstrap-server kafka:29092 --create --topic metrics --partitions 1 --replication-factor 1

Running the Pipeline

Step 1. Start the Producer

python metrics_producer.py

Sends random CPU/memory values (per host) to Kafka every few seconds.

Step 2. Start the Consumer

python db_consumer.py

Reads messages from Kafka → Inserts into metrics_raw table.

Step 3. Run PyFlink Job

python flink_job.py

Applies EWMA anomaly detection and writes results into metrics_anomalies.

Step 4. View Data in Postgres

docker exec -it rtm-ai-postgres-1 psql -U postgres -d metrics
\dt
SELECT * FROM metrics_raw LIMIT 5;
SELECT * FROM metrics_anomalies LIMIT 5;

Step 5. Visualize in Grafana

Navigate to: http://localhost:3000

Add PostgreSQL data source (host: postgres:5432)

Create dashboards using SQL panels:

CPU Panel
Memory Panel
Anomaly Score Panel

Example Grafana Queries

CPU Panel

SELECT to_timestamp(ts) AS "time", host, cpu
FROM metrics_anomalies
WHERE $__timeFilter(to_timestamp(ts))
ORDER BY ts;

Memory Panel

SELECT to_timestamp(ts) AS "time", host, memory
FROM metrics_anomalies
WHERE $__timeFilter(to_timestamp(ts))
ORDER BY ts;

Anomaly Score Panel

SELECT to_timestamp(ts) AS "time", host, score
FROM metrics_anomalies
WHERE score >= 3
  AND $__timeFilter(to_timestamp(ts))
ORDER BY ts;

Future Enhancements

✅ AI Forecasting (ARIMA / LSTM via River / PyTorch)
✅ Real-time Alerting via Slack / PagerDuty
✅ Auto-scaling metrics ingestion using Kubernetes
✅ Integration with Prometheus / OpenTelemetry
✅ Web dashboard for anomaly reports

Author

Naitik Shah
Master of Engineering in Computer Science
Oregon State University
🔗 LinkedIn
🔗 GitHub

License

This project is licensed under the MIT License.

⭐ Acknowledgments

Apache Flink & Kafka documentation
TimescaleDB community
Grafana Labs tutorials
Confluent Schema Registry samples
“Building reliable systems means understanding the data flowing through them. This project bridges the gap between monitoring, prediction, and intelligent automation.”

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
anomaly-detector		anomaly-detector
consumer		consumer
grafana		grafana
images		images
producer		producer
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time System Monitoring

Overview

Tech Stack

System Architecture

AI & Anomaly Detection

Repository Structure

Quick Start

Setup & Installation

Running the Pipeline

Future Enhancements

Author

License

⭐ Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real-Time System Monitoring

Overview

Tech Stack

System Architecture

AI & Anomaly Detection

Repository Structure

Quick Start

Setup & Installation

Running the Pipeline

Future Enhancements

Author

License

⭐ Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages