# NUBot: Retrieval-Augmented Generation (RAG) Chatbot
NUBot is an intelligent chatbot designed to assist students and visitors with queries related to Northeastern University, such as courses, faculty, co-op opportunities, and more. It utilizes a Retrieval-Augmented Generation (RAG) approach to provide instant, accurate responses.
## Prerequisites
Before setting up the project, install the Python debugger extension in VS Code:
- [Python Debugger Extension](https://marketplace.visualstudio.com/items?itemName=ms-python.debugpy)
## Features
- Instant responses to academic-related queries.
- Scalable and efficient system for handling high query volumes.
- Continuous updates via cloud deployment.
---
## Setup
### Installing Dependencies
1. Add dependencies in the `pyproject.toml` under the `_dependencies_` array.
2. Run the following command to install them:
```bash
pip install .
```To run the backend service, choose one of the following methods:
-
Go to the root directory of NUBot.
-
Run the following command:
python -m src.backend.api
-
Open the terminal in the NUBot directory.
-
Set the environment variable and start the Flask server:
export FLASK_APP=src.backend.api flask run
-
Open the terminal in the NUBot directory.
-
Set the environment variable and start the Flask server:
set FLASK_APP=src.backend.api flask run
- Open Run and Debug in VS Code.
- Click on the Run button to start the backend.
The backend will now be running at http://localhost:5000.
-
Install and open Docker.
-
Run the following command to build the project:
docker compose build
This will copy/mount the entire repository to Docker to resolve import errors.
-
Initialize Airflow:
docker compose up airflow-init
The cursor will stop at
airflow-init exited. Press Enter or any key to continue. -
Start Airflow:
docker compose up
Wait until the curl request appears.
-
Open a browser and navigate to
localhost:8080. -
Locate the DAG "web_scraping", run it, and wait until the status shows Success (dark green color).
- This DAG scrapes a webpage and stores the data in JSON format.
To stop Airflow, open a new terminal and run:
docker compose down-
Start in detached mode:
docker compose up -d
-
Run:
docker compose up
If DVC is not initialized, run:
dvc init-
Add tracking to JSON files:
dvc add scraped_data/
-
Track changes with Git:
git add .gitignore scraped_data.dvc
Once completed, follow the standard Git workflow.
On successful execution, the DAG status will appear as follows:
To install Prefect with all dependencies, run:
pip install -U prefect[all]Start the Prefect UI server on port 4200 by running:
prefect server startOnce started, access the UI at: http://localhost:4200
Run the DAG script in src/prefectWorflows using one of the following commands:
python scraper_flow.py
# OR
python -m src.prefectWorflows.scraper_flowAfter execution, refresh the Prefect UI at http://localhost:4200 to see the running DAG.
- Unlike Airflow, tools such as Prefect and Dagster do not automatically detect workflows.
- Workflows need to be triggered manually.
- For multiple workflows, combine all flows in a single file and register them to Prefect Cloud.
To deploy and run workflows anywhere, first log in to Prefect Cloud:
prefect cloud loginThen, register the flows and deploy them accordingly.
By following these steps, you can efficiently run and manage your Prefect workflows both locally and in the cloud.

