Test Implementation of Mixture of Experts Universal Transformer on conditional MNIST generation
The main goal of this short project was to familiarize myself with expert parallel and MoE, but I also wanted to do parameter sharing for fun.
| Name | Name | Last commit date | ||
|---|---|---|---|---|