Skip to content

werhereitacademy/DataEngineering_Module_Week_10

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 

Repository files navigation

DataEngineering_Module_Week_10


πŸ“˜ Instructions: Real-Time Data Streaming with Kafka (Python + Docker)

1. Requirements


2. Setting Up the Docker Environment

  1. Create a folder named kafka_demo on your desktop.

  2. Copy the docker-compose.yml file (given by your instructor) into this folder.

  3. Open the command line and go to this folder:

    cd Desktop\kafka_demo
    
  4. Start Kafka and its user interface:

    docker compose up -d
    

βœ… Kafka Broker: localhost:9092 βœ… Kafka UI: http://localhost:8081 β†’ Interface for Kafka settings βœ… Kafka Public Broker: localhost:9093

To stop the services:

docker compose down

3. WeatherAPI Settings

The API endpoint to use:

http://api.weatherapi.com/v1/current.json?key={API_KEY}&q=Amsterdam&aqi=no

4. Public Access with Ngrok

(We use this to make data readable from Fabric by making it public)

  • Create an Ngrok account β†’ https://dashboard.ngrok.com

  • Install Ngrok on your computer

  • Run this command in the terminal:

    ngrok tcp 9093
    

⚠️ Note: To use TCP tunnels, you must enter your credit card info in Settings > Account on your Ngrok dashboard. You won’t be charged β€” it’s only for identity verification.

Ngrok will give you an address like this:

tcp://<ngrok-host>:<ngrok-port>
Example: tcp://6.tcp.eu.ngrok.io:17090

You can connect to your local Kafka producer from the outside (for example, from Fabric Spark) using this address. You also need to add this address to the KAFKA_ADVERTISED_LISTENERS section in your Docker setup file.


5. Tasks

🧩 Producer

  • Create a Python file.

  • Get the following data from WeatherAPI:

    • City
    • Temperature
    • Humidity
    • Wind speed
    • Local time
    • Last updated
  • Send this data to Kafka every 60 seconds.


πŸ–₯️ Consumer 1 – Python

  • Create another Python file.
  • It should read real-time data from Kafka and show the current wind speed on the console.

⚑ Consumer 2 – Spark

  • In the Fabric portal, create a new Notebook.
  • Start a Spark session.
  • Use Structured Streaming to read the data coming from Ngrok.
  • Add a timestamp to each record automatically.
  • Calculate the average temperature in a 5-minute window.
  • Move the window every 1 minute (so the average updates each minute).
  • Save the results to your Lakehouse in Delta format under the table name avg_temperature.

Good Luck!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors