This project implements a sharded database system that ensures data consistency across replicated shards using a Write-Ahead Logging (WAL) mechanism. This approach helps maintain a consistent state in the event of unexpected shutdowns by logging changes before applying them to the database.
- Clone the repository:
git clone https://github.com/your-repository-url
- Navigate to the project directory:
cd your-project-directory - Build the Docker image:
docker build -t your-docker-image . - Run the Docker container:
docker run -p 5000:5000 your-docker-image
To start the system, execute the Docker container which will initiate the database and start the Flask application. The system is then ready to handle API requests to manage data entries.
- Sharded Database: Distributes data across several shards, enhancing performance and scalability.
- Write-Ahead Logging: Ensures data integrity by logging changes before they are committed to the database.
- Replication: Maintains copies of data across different servers to ensure high availability and fault tolerance.
- Recovery: Supports system recovery by restoring the database to a consistent state using the WAL files.
Configuration details such as database connection settings are managed in the Docker configuration files and environment variables.
Refer to the inline comments within the code for detailed explanations of the functionality and architecture. The project also includes system diagrams and a detailed explanation of the WAL mechanism.
Here are some examples of API calls that can be made to the system:
-
Add Entry:
curl -X POST localhost:5000/add -d '{"shard":"sh1", "data":{"Stud_id":123, "Stud_name":"John Doe", "Stud_marks":88}}' -
Update Entry:
curl -X PUT localhost:5000/update -d '{"shard":"sh2", "Stud_id":123, "data":{"Stud_marks":90}}' -
Delete Entry:
curl -X DELETE localhost:5000/delete -d '{"shard":"sh3", "Stud_id":123}'
This project demonstrates a scalable, sharded database system intended for educational purposes on distributed systems concepts. It features server implementation handling sharded data, a custom load balancer for request distribution, and analysis of system performance under various configurations.
- OS: Ubuntu 20.04 LTS or above
- Docker: Version 20.10.23 or above
- Python: 3.6 or newer
- MySQL: Version 8.0
- Install Docker
- Visit Docker's official installation guide and follow the instructions to install Docker on Ubuntu.
- Build Docker Containers
# Without using docker-compose make run # With using docker-compose make run_compose
This scalable sharded database system is designed around the principles of distributed databases, specifically focusing on sharding and load balancing.
- Database Sharding: Data is horizontally partitioned across multiple shards. Each shard holds a subset of the data, allowing for distributed queries and operations.
- Load Balancer: Implements consistent hashing to distribute read and write requests across shards efficiently, ensuring even load distribution and facilitating scalability.
- Server Containers: Each container simulates a database server managing a set of shards. Servers are orchestrated using Docker, enabling easy scaling and replication.
Servers are designed to manage sharded data with the following endpoints:
/config: Initializes shard configurations./heartbeat: Provides server status./copy,/read,/write,/update,/del: Handle data operations within shards.
The load balancer has been enhanced to support dynamic shard and server management, featuring endpoints for initializing configurations, adding/removing servers, and distributing read/write requests.
Performance analysis involved measuring read and write speeds under various configurations:
- Default Configuration: Demonstrated baseline performance.
- Increased Shard Replicas: Showed improved read speeds due to parallelism.
- Increased Servers and Shards: Highlighted scalability and its impact on performance.
- Endpoint Correctness: Validated through simulated server failures and automatic recovery.

Test-1 results for 10000 reads and 10000 writes

Test-2 results for 10000 reads and 10000 writes (Write Speed-down = 2.14267270488, Read Speed-up = 1.08771929825)

Test-3 results for 10000 reads and 10000 writes (Write Speed-down = 2.5979025146, Read Speed-up = 1.0350877193)

Working of Server endpoint /config

Working of Server endpoint /heartbeat

Working of Server endpoint /copy

Working of Server endpoint /read

Working of Server endpoint /write

Working of Server endpoint /update

Working of Server endpoint /del

Working of Load-balancer endpoint /init

Working of Load-balancer endpoint /status

Working of Load-balancer endpoint /add

Working of Load-balancer endpoint /rm

Working of Load-balancer endpoint /read

Working of Load-balancer endpoint /write

Working of Load-balancer endpoint /update

Working of Load-balancer endpoint /del
The project leverages Docker to run MySQL alongside the Flask application, facilitating an isolated and replicable environment. The Dockerfile sets up MySQL and installs necessary Python dependencies, while deploy.sh initializes the Flask application.
- Docker Documentation: https://docs.docker.com/
- MySQL Official Guide: https://dev.mysql.com/doc/
- Python Virtual Environments: https://docs.python.org/3/tutorial/venv.html
- Aakash Gupta
- Rajanyo Paul
- Avik Pramanick
- Soham Banerjee

