Support for Encoder-Decoder-style architectures

I regularly follow the developments on this project, and I must say that I am very interested and pleased with the direction `curated-transformers` is taking. The code is very understandable and high-quality, it's a pleasure to work with, congratulations!

This is perhaps already in your plans, but just to mention it here, I think a very nice addition to the project would be to have at least one reference implementation of an encoder-decoder style Transformers, such as the T5 architecture. T5 models are very popular for some tasks, especially in the < 1B parameters range which is still very relevant nowadays. Currently we have reference implementations for decoder-style and encoder-style models, but we're missing at least one reference implementation of an encoder-decoder-style architecture, perhaps with a reusable cross-attention block.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Encoder-Decoder-style architectures #340

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for Encoder-Decoder-style architectures #340

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions