GitHub - serhiiur/Scrapy-Splash-With-Nginx-Load-Balancer: Scrapy based spider using multiple Splash instances served by Nginx

Introduction

This project provides an example of how to run a Scrapy-based spider on multiple Scrapy-Splash instances using Nginx as a load balancer. The spider and its services are managed by Docker and Docker Compose.

Prerequisites

The spider scrapes data from the well-known Quotes to Scrape website, extracting some basic information about quotes.

How it works

There are 3 Splash instances defined in the docker-compose.yml file. However, you can easily scale up or down the number of instances by modifying the docker-compose.yml and nginx.conf files accordingly.

Note: there's a more flexible way to scale up/down the number of Splash instances by using Docker Swarm's deploy feature, since it's internally provides a load balancing mechanism between the replicas. However, for simplicity, this implementation involves manual scaling and uses Nginx as a load balancer for the specified Splash instances.

System Requirements

Docker
Docker Compose plugin
Python 3.9+

Usage

Once the environment is set up, you can run the spider and its services using Docker Compose:

  docker compose up --abort-on-container-exit

As a result, the spider will start scraping data from the target website, distributing requests across the available Splash instances using Nginx as a load balancer.

Note: --abort-on-container-exit flag is used to stop all services when the spider finishes its job.

References

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
assets/images		assets/images
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
nginx.conf		nginx.conf
requirements.txt		requirements.txt
spider.py		spider.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Prerequisites

How it works

System Requirements

Usage

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Prerequisites

How it works

System Requirements

Usage

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages