Prefect Workflow Setup

# NUBot: Retrieval-Augmented Generation (RAG) Chatbot

NUBot is an intelligent chatbot designed to assist students and visitors with queries related to Northeastern University, such as courses, faculty, co-op opportunities, and more. It utilizes a Retrieval-Augmented Generation (RAG) approach to provide instant, accurate responses.

## Prerequisites

Before setting up the project, install the Python debugger extension in VS Code:

- [Python Debugger Extension](https://marketplace.visualstudio.com/items?itemName=ms-python.debugpy)

## Features

- Instant responses to academic-related queries.
- Scalable and efficient system for handling high query volumes.
- Continuous updates via cloud deployment.

---

## Setup

### Installing Dependencies

1. Add dependencies in the `pyproject.toml` under the `_dependencies_` array.
2. Run the following command to install them:

   ```bash
   pip install .
   ```

Backend Setup

To run the backend service, choose one of the following methods:

Option 1: Using Python Command

Go to the root directory of NUBot.
Run the following command:
```
python -m src.backend.api
```

Option 2: For Linux/macOS (Terminal)

Open the terminal in the NUBot directory.
Set the environment variable and start the Flask server:
```
export FLASK_APP=src.backend.api
flask run
```

Option 3: For Windows (Command Prompt)

Open the terminal in the NUBot directory.
Set the environment variable and start the Flask server:
```
set FLASK_APP=src.backend.api
flask run
```

Option 4: Running via VS Code (Run and Debug)

Open Run and Debug in VS Code.
Click on the Run button to start the backend.

The backend will now be running at http://localhost:5000.

Airflow Setup

Initial Setup (First-Time or After Changes)

Install and open Docker.
Run the following command to build the project:
```
docker compose build
```
This will copy/mount the entire repository to Docker to resolve import errors.
Initialize Airflow:
```
docker compose up airflow-init
```
The cursor will stop at airflow-init exited. Press Enter or any key to continue.
Start Airflow:
```
docker compose up
```
Wait until the curl request appears.
Open a browser and navigate to localhost:8080.
Locate the DAG "web_scraping", run it, and wait until the status shows Success (dark green color).
- This DAG scrapes a webpage and stores the data in JSON format.

Stopping Airflow

To stop Airflow, open a new terminal and run:

docker compose down

Running Airflow from the Second Time Onwards

Start in detached mode:
```
docker compose up -d
```
Run:
```
docker compose up
```

DVC (Data Version Control) Setup

Initializing DVC

If DVC is not initialized, run:

dvc init

Tracking Scraped Data

Add tracking to JSON files:
```
dvc add scraped_data/
```
Track changes with Git:
```
git add .gitignore scraped_data.dvc
```

Once completed, follow the standard Git workflow.

Successful DAG Run Output

On successful execution, the DAG status will appear as follows:

Prefect Workflow Setup

Installation

To install Prefect with all dependencies, run:

pip install -U prefect[all]

Running Prefect Server

Start the Prefect UI server on port 4200 by running:

prefect server start

Once started, access the UI at: http://localhost:4200

Running the Workflow

Run the DAG script in src/prefectWorflows using one of the following commands:

python scraper_flow.py
# OR
python -m src.prefectWorflows.scraper_flow

After execution, refresh the Prefect UI at http://localhost:4200 to see the running DAG.

Important Notes

Unlike Airflow, tools such as Prefect and Dagster do not automatically detect workflows.
Workflows need to be triggered manually.
For multiple workflows, combine all flows in a single file and register them to Prefect Cloud.

Deploying to Prefect Cloud

To deploy and run workflows anywhere, first log in to Prefect Cloud:

prefect cloud login

Then, register the flows and deploy them accordingly.

By following these steps, you can efficiently run and manage your Prefect workflows both locally and in the cloud.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.dvc		.dvc
.github/workflows		.github/workflows
.vscode		.vscode
assets		assets
dags		dags
data		data
logs		logs
src		src
tests		tests
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.env		.env
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
airflow-webserver.pid		airflow-webserver.pid
airflow.cfg		airflow.cfg
docker-compose.yaml		docker-compose.yaml
dockerfile		dockerfile
image.png		image.png
metadata.pkl		metadata.pkl
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
scraped_data.dvc		scraped_data.dvc
vector_index.faiss		vector_index.faiss
webserver_config.py		webserver_config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Backend Setup

Option 1: Using Python Command

Option 2: For Linux/macOS (Terminal)

Option 3: For Windows (Command Prompt)

Option 4: Running via VS Code (Run and Debug)

Airflow Setup

Initial Setup (First-Time or After Changes)

Stopping Airflow

Running Airflow from the Second Time Onwards

DVC (Data Version Control) Setup

Initializing DVC

Tracking Scraped Data

Successful DAG Run Output

Prefect Workflow Setup

Installation

Running Prefect Server

Running the Workflow

Important Notes

Deploying to Prefect Cloud

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Backend Setup

Option 1: Using Python Command

Option 2: For Linux/macOS (Terminal)

Option 3: For Windows (Command Prompt)

Option 4: Running via VS Code (Run and Debug)

Airflow Setup

Initial Setup (First-Time or After Changes)

Stopping Airflow

Running Airflow from the Second Time Onwards

DVC (Data Version Control) Setup

Initializing DVC

Tracking Scraped Data

Successful DAG Run Output

Prefect Workflow Setup

Installation

Running Prefect Server

Running the Workflow

Important Notes

Deploying to Prefect Cloud

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages