-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Hello,
I was trying to train syntactic HMM on my data. My training data contains 10050
parallel sentences with parsed target trees.
wc output of my training data
-------------------------------
10050 284765 1599230 corpus.en
10050 804959 4284275 corpus.entrees
10050 228873 5058993 corpus.ta
30150 1318597 10942498 total
When I run the alignment, the logfile indicate that there are only 9811
sentences read instead of 10050. Here is what I am seeing in the logfile.
Eventually after the training, I am seeing alignment only for 9811 sentences.
PS: I don't have any testing data. My test data directories are empty. I have
attached my config file too.
main() {
Execution directory: en-ta/alignment_models/berkeley/lc_tok_10000_S
Preparing Training Data
Unknown number of training, 0 test
Training models: 2 stages {
Training stage 1: MODEL1 and MODEL1 jointly for 5 iterations {
Initializing forward model [7.9s, cum. 7.9s]
Initializing reverse model [5.2s, cum. 13s]
Joint Train: 9811 sentences, jointly {
Iteration 1/5 {
Sentence 1/9811
Sentence 2/9811
Sentence 3/9811
Sentence 169/9811
Sentence 3304/9811
Sentence 7650/9811
Log-likelihood 1 = -1337616.882
Log-likelihood 2 = -1336443.902
... 9805 lines omitted ...
} [20s, cum. 20s]
pls, let me know if I am missing something.
Original issue reported on code.google.com by loganath...@gmail.com on 2 Aug 2013 at 10:10
Attachments: