More information at https://alectorsite.wordpress.com/corpus/ .
I made this repository public since the corpus is hardly accessible online.
Developed with ubuntu. You will need to have installed:
- Firefox Browser
- gecko driver
- selenium python package
You will also need to be registered on alector's website.
- Download and extract the latest release (https://github.com/mozilla/geckodriver/releases). Example :
wget https://github.com/mozilla/geckodriver/releases/download/v0.29.1/geckodriver-v0.29.1-linux64.tar.gztar -xvzf geckodriver-v0.29.1-linux64.tar.gz
- Make the file executable:
chmod +x geckodriver - Create a folder where your geckodriver application will remain. Example:
mkdir /lib/geckodriver/
- Move the file to this newly created folder. Example:
mv geckodriver /lib/geckodriver/geckodriver
- Add the folder to PATH. Example:
PATH=$PATH:/lib/geckodriver/
Execute python scrape_alector.py. Give your credentials when prompted, and voilà!
Núria Gala, Anaïs Tack, Ludivine Javourey-Drevet, Thomas François, Johannes C. Ziegler, Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers. Proceedings of the 12th Language Resources and Evaluation Conference. [aclweb]