Skip to content

Anshul3901/DatabaseManagementProject

Repository files navigation

Database Management Project

Overview

This project is designed to parse, process, and insert Yelp-like business data into a PostgreSQL database. It includes scripts for parsing JSON data, creating and managing database tables, and inserting data efficiently. The project is structured for educational purposes, focusing on database schema design, data ETL (Extract, Transform, Load), and SQL scripting.

Directory Structure

.
├── SQL/                  # SQL scripts for schema, triggers, functions, and updates
├── parser/               # Python scripts for parsing and inserting data
├── zipData.sql           # Large SQL data file
├── milestone1BusinessTable.sql # Example business table SQL
├── parseAndInsert.py     # Standalone script for parsing and inserting business data
├── update.sql            # SQL update script
├── DDL.sql               # Main database schema (DDL) script
├── CptS451_Online_parseJSON.py # Script for parsing various Yelp data files
├── ER to remake.pdf      # Entity-Relationship diagrams
├── 451 ER.pdf            # Additional ER diagrams
├── ER remake sample.pdf  # Sample ER diagram
├── .gitignore            # Git ignore file

Requirements

  • Python 3.x
  • PostgreSQL
  • Python packages: psycopg2
  • Yelp-like JSON data files (e.g., yelp_business.JSON, yelp_user.JSON, etc.)

Setup

  1. Clone the repository and navigate to the project directory.
  2. Install Python dependencies:
    pip install psycopg2
  3. Set up the PostgreSQL database:
    • Create a new database (e.g., CptS451_TermProject or yelpdb).
    • Update database credentials in the Python scripts as needed.
    • Run the DDL script to create tables:
      psql -U <username> -d <database> -f DDL.sql
    • (Optional) Use scripts in SQL/ for additional schema, triggers, or functions.

Usage

1. Parsing and Inserting Data

  • Standalone Script:

    • parseAndInsert.py parses business.JSON and inserts data into the businessTable.
    • Update the file paths and database credentials as needed in the script.
    • Run:
      python parseAndInsert.py
  • Comprehensive Parser:

    • CptS451_Online_parseJSON.py parses multiple Yelp data files (business, user, review, checkin) and outputs .txt files for each.
    • Update file paths as needed.
    • Run:
      python CptS451_Online_parseJSON.py
  • Modular Parser (Recommended):

    • The parser/ directory contains modular scripts for each data type and a main.py to orchestrate parsing and database insertion.
    • Update database credentials in parser/yelp_data.py and file paths as needed.
    • Run:
      cd parser
      python main.py

2. Database Management

  • Use the SQL scripts in the SQL/ directory to manage schema, triggers, and updates as needed.
  • Example:
    psql -U <username> -d <database> -f SQL/AbracaData_RELATIONS_v3.sql

Entity-Relationship Diagrams

  • See the provided PDF files for ER diagrams and schema references.

Notes

  • Ensure your PostgreSQL server is running and accessible.
  • Update all file paths and credentials in scripts before running.
  • The project is intended for educational use and may require adaptation for production environments.

License

This project is for academic and educational purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •