Skip to content

Different Alignment Result on the same input and setting #2

@GoogleCodeExporter

Description

@GoogleCodeExporter
What steps will reproduce the problem?

1.) I rerun the aligner twice on the same data but I got slightly
different results. I am not sure whether it is a bug of the program or
not. Here is my command.

java -server -Xmx1000m -ea -jar
/home/paisarn/mt-util/berkeleyaligner-1.1/berkeleyaligner.jar -execDir
/home/paisarn/aligner/out-enfr20-2-berkeley1.1 -englishSuffix en
-foreignSuffix fr -exec.create true -Main.saveParams true
-Main.alignTraining true -Main.testSources -Main.iters 5 5
-EMWordAligner.numThreads 4 -Main.trainSources
/home/paisarn/aligner/enfr20k/

When I run this command as the second time, i just changed -execDir
parameter. After that, I check both training.en-fr.align from both
folders but they are slightly different. Could you please explain it
to me whether it was normal or there was something wrong.

2. the result in training.en-fr.align generated by aligner version
1.1 seems to be swap between src and target word index.

For example, in the training.en-fr.align

Generated by Aligner 1.0: 6-8 3-2 4-3 2-1 7-10 5-5
Generated by Aligner 1.1: 7-5 1-2 2-3 0-1 9-6 4-4

Supposed for Aligner 1.1: 5-7 2-1 3-2 1-0 6-9 4-4

From what I understand, the result from Aligner 1.1 should be
compatible with Giza++. So the word indexes should be just minus one
from Aligner 1.0.

What version of the product are you using? On what operating system?
Currently I use Aligner 1.1 on Linux Fedora Core 9 (64bit) 


Original issue reported on code.google.com by paisar...@gmail.com on 1 Jun 2009 at 1:41

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions