This project is designed to parse, process, and insert Yelp-like business data into a PostgreSQL database. It includes scripts for parsing JSON data, creating and managing database tables, and inserting data efficiently. The project is structured for educational purposes, focusing on database schema design, data ETL (Extract, Transform, Load), and SQL scripting.
.
├── SQL/ # SQL scripts for schema, triggers, functions, and updates
├── parser/ # Python scripts for parsing and inserting data
├── zipData.sql # Large SQL data file
├── milestone1BusinessTable.sql # Example business table SQL
├── parseAndInsert.py # Standalone script for parsing and inserting business data
├── update.sql # SQL update script
├── DDL.sql # Main database schema (DDL) script
├── CptS451_Online_parseJSON.py # Script for parsing various Yelp data files
├── ER to remake.pdf # Entity-Relationship diagrams
├── 451 ER.pdf # Additional ER diagrams
├── ER remake sample.pdf # Sample ER diagram
├── .gitignore # Git ignore file
- Python 3.x
- PostgreSQL
- Python packages:
psycopg2 - Yelp-like JSON data files (e.g.,
yelp_business.JSON,yelp_user.JSON, etc.)
- Clone the repository and navigate to the project directory.
- Install Python dependencies:
pip install psycopg2
- Set up the PostgreSQL database:
- Create a new database (e.g.,
CptS451_TermProjectoryelpdb). - Update database credentials in the Python scripts as needed.
- Run the DDL script to create tables:
psql -U <username> -d <database> -f DDL.sql
- (Optional) Use scripts in
SQL/for additional schema, triggers, or functions.
- Create a new database (e.g.,
-
Standalone Script:
parseAndInsert.pyparsesbusiness.JSONand inserts data into thebusinessTable.- Update the file paths and database credentials as needed in the script.
- Run:
python parseAndInsert.py
-
Comprehensive Parser:
CptS451_Online_parseJSON.pyparses multiple Yelp data files (business, user, review, checkin) and outputs.txtfiles for each.- Update file paths as needed.
- Run:
python CptS451_Online_parseJSON.py
-
Modular Parser (Recommended):
- The
parser/directory contains modular scripts for each data type and amain.pyto orchestrate parsing and database insertion. - Update database credentials in
parser/yelp_data.pyand file paths as needed. - Run:
cd parser python main.py
- The
- Use the SQL scripts in the
SQL/directory to manage schema, triggers, and updates as needed. - Example:
psql -U <username> -d <database> -f SQL/AbracaData_RELATIONS_v3.sql
- See the provided PDF files for ER diagrams and schema references.
- Ensure your PostgreSQL server is running and accessible.
- Update all file paths and credentials in scripts before running.
- The project is intended for educational use and may require adaptation for production environments.
This project is for academic and educational purposes.