https://github.com/google/corpuscrawler
https://github.com/google/corpuscrawler