- Command-line image scraper based on an Excel spreadsheet template (aka image log)
- Pulls URLs and figure numbers from Excel spreadsheets, creates "/images" folder in location of image log, downloads image file
- Automatically renames image files to figure numbers
- Keeps original image extension (i.e., no mass .jpg renaming)
- Currently only works with Wikimedia images
- Known bugs: issues with https access, needs handling for Wiki images without Original File link
-
Install the Log2Layout gem.
gem install log2layout
-
Run the gem with the name of your Excel spreadsheet file as the argument.
log2layout c:/Users/You/Desktop/image_log.xlsx
-
Magic happens (sometimes the magic can take a little while--these are high res images, after all), and suddenly an "images" folder appears beside your spreadsheet. Ta-da!
- GUI interface (wxRuby)
- General page scanning: will scan page for a unique
tag in the page's HTML and download the available image resources
- Duplicate handling--if it pulls 2+ images, it will: 1) prefix all images with a DUP_ label, 2) highlight the Excel row in red, 3) create and automatically open a text file log (saved to the /images folder) that provides the row number of the Excel spreadsheet that got multiple images, and a list of the image file names to be compared
- Inserts a thumbnail of the image into the spreadsheet (creates column)
- Generates caption text boxes in InDesign with Figure Title paragraph style applied, saved to a /resources folder