-
Notifications
You must be signed in to change notification settings - Fork 35
Description
I regularly follow the developments on this project, and I must say that I am very interested and pleased with the direction curated-transformers
is taking. The code is very understandable and high-quality, it's a pleasure to work with, congratulations!
This is perhaps already in your plans, but just to mention it here, I think a very nice addition to the project would be to have at least one reference implementation of an encoder-decoder style Transformers, such as the T5 architecture. T5 models are very popular for some tasks, especially in the < 1B parameters range which is still very relevant nowadays. Currently we have reference implementations for decoder-style and encoder-style models, but we're missing at least one reference implementation of an encoder-decoder-style architecture, perhaps with a reusable cross-attention block.