This project is part of the Decentralized Data Technologies course offered by the Department of Computer Engineering & Informatics at the University of Patras. The objective of the project is to design and implement a decentralized query processing and optimization system built on top of Apache Spark. The system will efficiently distribute query plans across multiple nodes, enabling parallel execution and improving performance in a decentralized manner. Additionally, the system will include query optimization capabilities by pushing down filters and projections to data sources, thereby minimizing data transfer and enhancing overall query efficiency.
To set up the environment for this project, follow these steps:
-
Ensure you have
condainstalled. If not, you can download and install it from here. -
Navigate to the project directory.
-
Create the environment using the
environment.ymlfile:conda env create -f environment.yml
-
Activate the environment:
conda activate spark-env
-
Start Jupyter Notebook:
jupyter notebook
You should now be able to run the notebooks and scripts in this project.
Due to the academic nature of this project, contributions are not accepted.
This project is licensed under the MIT License - see the LICENSE file for details.