Clarification about alignment for transformers #368
Unanswered
kirianguiller
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone, thanks for all of your amazing work on Marian :).
I open this discussion (and hope it's in the right place) because I'm a little confuse with the alignment functionnality of Marian transformer.
So far, my understanding is that if you want to have a transformer model that does alignment, you need to preprocess alignment on your training corpus and feed it (with the --guided-alignment parameter).
However, because the transformer need to be fed sub tokens (preprocessed by sentencepiece or other tokenizer), I am assuming that the preprocessed alignment NEED to be base on the sub tokens.
Therefore, the pipeline for training would also NEED to be the following :
Am I correct ? Or is there a less cumbersome pipeline ? Or a pipeline that would only use token level word alignment ?
For a word level alignment, I guess it is simply impossible to get for the following reason :
Thanks in advance for your help !
Beta Was this translation helpful? Give feedback.
All reactions