Skip to content

Latest commit

 

History

History
32 lines (20 loc) · 547 Bytes

File metadata and controls

32 lines (20 loc) · 547 Bytes

Hypertext Corpus Initiative

Welcome to the Hypertext Corpus Initiative (HCI) project.

This project consist in the following components:

  • HCI core
  • HCI crawler

HCI core

TBD

HCI crawler

The HCI crawler implemented as a Scrapy project. For more information see: http://jiminy.medialab.sciences-po.fr/hci/index.php/Scrapy_implementation_proposal

Code is in hcicrawler/ directory.

Requirements

Requirements:

  • Scrapy >= 0.14
  • pymongo >= 2.0