- Decoder should be abstracted out to be able to add our copy header on top of any pair of encoder-decoder. - This will help make copynet agnostic of the rnn based architecture and allow using with transformers too