Hi @bowbowbow, thanks a lot for putting this together. Was wondering if it will be easy to extend the content in main.py to support Arabic.
In my initial trials, I tried the following:
- Created data_list_arabic.csv file to include train/dev/test splits. An example of the first few lines of the file looks like the following:
type,path
train,nw/adj/ALH20001201.1900.0126
train,nw/adj/ALH20001201.1300.0071
dev,nw/adj/ALH20001128.1300.0081
test,nw/adj/ALH20001125.0700.0024
test,nw/adj/ALH20001124.1900.0127
- Built Arabic properties following the info in https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-arabic.properties:
arabic_properties = {'annotators': 'tokenize,ssplit,pos,lemma,parse',
'tokenize.language': 'ar',
'segment.model': 'edu/stanford/nlp/models/segmenter/arabic/arabic-segmenter-atb+bn+arztrain.ser.gz',
'ssplit.boundaryTokenRegex': '[.]|[!?]+|[!\u061F]+',
'pos.model': 'edu/stanford/nlp/models/pos-tagger/arabic/arabic.tagger',
'parse.model': 'edu/stanford/nlp/models/lexparser/arabicFactored.ser.gz'}
- Created the nlp_res_raw object as:
nlp_res_raw = nlp.annotate(item['sentence'], properties=arabic_properties)
- Downloaded the Arabic models:
cd stanford-corenlp-full-2018-10-05
wget http://nlp.stanford.edu/software/stanford-arabic-corenlp-2018-02-27-models.jar
Now when I run the script, I keep getting the following error: Failed to load segmenter edu/stanford/nlp/models/segmenter/arabic/arabic-segmenter-atb+bn+arztrain.ser.gz.
I must be making a mistake somewhere of not downloading the correct package or pointing an env_variable to the correct location. Any help to add support for Arabic is greatly appreciated.
Hi @bowbowbow, thanks a lot for putting this together. Was wondering if it will be easy to extend the content in main.py to support Arabic.
In my initial trials, I tried the following:
Now when I run the script, I keep getting the following error:
Failed to load segmenter edu/stanford/nlp/models/segmenter/arabic/arabic-segmenter-atb+bn+arztrain.ser.gz.I must be making a mistake somewhere of not downloading the correct package or pointing an env_variable to the correct location. Any help to add support for Arabic is greatly appreciated.