Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the M2 score in models/README.md different from that in your paper ? #2

Open
h-asano opened this issue Dec 21, 2018 · 2 comments

Comments

@h-asano
Copy link

h-asano commented Dec 21, 2018

Your reported M2 score on CoNLL2014 is 57.53.
In your paper, the M2 score is 55.8.

@snukky
Copy link
Contributor

snukky commented Dec 21, 2018

The published system is not exactly the same system we trained for the paper as we have lost the original models and config files. I reconstructed the system with a newer version of Marian, and there are several reasons why M2 scores are different:

  • This is different training run.
  • Implementation of transformer models or default training parameters in Marian could change slightly during last year.
  • I replaced averaging four best model checkpoints with built-in exponential smoothing, which is similar, but probably slightly more effective. This was also a nice simplification.
  • I used my recent experience with training tranformer models to chose training parameters that we didn't mention in the paper.

So these are changes that someone could make while reconstructing our systems from scratch using the same data. The training data, subword segmentation codes, and vocabularies are exactly the same.

@h-asano
Copy link
Author

h-asano commented Dec 21, 2018

Thank you very much !

@snukky snukky pinned this issue Mar 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants