tail -f download_log.txtPress Ctrl+C to stop watching
# Downloaded PDFs
find village_maps -name "*.pdf" | wc -l
# PDF Links collected
cat all_pdf_links.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(sum(len(h) for dist in d.values() for tal in dist.values() for h in tal.values()))"
# Progress from file
cat download_progress.json | python3 -m json.tool | grep -E "(downloaded|failed)" | head -2ps aux | grep download_all_pdfs.pypkill -f download_all_pdfs.pyIf stopped, just run again - it will skip already downloaded PDFs:
python3 download_all_pdfs.pyall_pdf_links.json- All PDF URLs organized by District > Taluk > Hobli > Villagevillage_maps/- Folder with all downloaded PDFs organized by locationdownload_progress.json- Tracks which PDFs are downloaded/faileddownload_log.txt- Full log of the download process
- Total villages: 18,323
- Estimated time: ~15-25 hours
- Progress saved: Every 10 PDFs
- Can resume: Yes, just restart the script
[████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 25.0% | ✅ 100 | ❌ 5 | ⏱️ 12.5/min | ⏳ ETA: 2:30:00 | 🕐 Elapsed: 0:48:00