Skip to content

Latest commit

 

History

History
92 lines (79 loc) · 3.01 KB

File metadata and controls

92 lines (79 loc) · 3.01 KB

Contributors Forks Stargazers Issues MIT License LinkedIn


github-crawler

Friendly github crawler.

Setup

  1. Install requirements
pip install -r requirement.txt
  1. Update source url as per your need in github/github/spiders/github-user.py
def start_requests(self):
		urls = [
			"your search url here"
		]

For CSV (default)

Set folllowing variables in settings.py

ITEM_PIPELINES = {
   'GithubCsvPipeline': 300,
}

For Elasticsearch

Set folllowing variables in settings.py

ELASTICSEARCH_HOST = ''
ELASTICSEARCH_PORT = 9200
ITEM_PIPELINES = {
   'GithubElasticsearchPipeline': 300,
}

Note: This option requires index to be already created in the elasticsearch server

For Google sheet:

  1. Set folllowing variables in settings.py
GOOGLE_SHEET =""
ITEM_PIPELINES = {
   'github.pipeline.GithubExcelPipeline': 300,
}
  1. Store googleapi credentials in utility/gsheets_credentials.json

Note: This option requires an existing google sheet with permissions "Editable by anyone who has link"

Run instructions

cd github
scrapy crawl github-user-search