IoTDataPipeline

Introduction

This project demonstrates the powerful combination of Kafka, a preferred tool for solution architects, and data analytics platforms to transition from batch processing to real-time or near-real-time data processing solutions.

Designed as part of a blog series, this repository aims to assist individuals currently facing challenges with batch processing infrastructures, offering insights into transitioning to more dynamic, real-time analytics solutions. For more information, refer to our blog series.

Overview

IoTDataPipeline is a compact yet insightful project that showcases real-time data ingestion, processing, and visualization, leveraging a stack of well-integrated technologies. It's a perfect starting point for those looking to understand the nuts and bolts of real-time data pipelines in the context of IoT (Internet of Things).

Components

The project spins up several services using Docker Compose:

Zookeeper & Kafka Broker: Forms the backbone of our messaging system, allowing for robust data ingestion and streaming.
Control Center: A web-based user interface for managing and monitoring Kafka.
Spark: The analytics engine that processes data streams in real-time.
Cassandra: A NoSQL database used to store processed results for future analysis.
Express App: A Node.js application that visualizes the real-time analytics results using Server-Sent Events (SSE).

Workflow

Data Production: A Python script (kafka_producer.py) simulates IoT device data, producing temperature readings that are sent to a Kafka topic.
Data Processing: Spark (spark_stream.py) consumes the temperature readings from Kafka, computes the average temperature per device, and performs dual actions:
- Streams the results to another Kafka topic for real-time visualization.
- Stores the results in Cassandra for historical analysis.
Install Python Dependencies:

Execute the following command to install the necessary Python packages:
```
pip install -r requirements.txt
```
Run Spark Streaming Application:

Execute the Spark streaming application to start listening to Kafka topics and processing data:
```
python spark_stream.py
```
Run Kafka Producer:

To simulate IoT device data, run the Kafka producer script with the following command:
```
python kafka_producer.py -producers=5 -duration=1
```
Note: `-producers` specifies the number of IoT devices to simulate, and `-duration` defines the time in minutes for which the script will run before stopping.
Start the Backend Server and Visualize Data:

Navigate to the backend folder and start the Node.js application:
```
node app.js
```

Then, open your web browser and go to http://localhost:3000 to view the processed data in real time.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
.gitignore		.gitignore
README.md		README.md
architecture.excalidraw		architecture.excalidraw
architecture.png		architecture.png
docker-compose.yml		docker-compose.yml
kafka_producer.py		kafka_producer.py
requirements.txt		requirements.txt
spark_stream.py		spark_stream.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IoTDataPipeline

Introduction

Overview

Components

Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IoTDataPipeline

Introduction

Overview

Components

Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages