The current code defaults to GPT2 for the transformer, which is a bit bulky and slow. And most of our datasets require context lengths of > 500 tokens, which adds to the slowness.
I could imagine some datasets that would demonstrate the model's ability to learn (and hopefully generalize) while not requiring quite so many resources.
A math dataset of digits and operations.
A question/answering dataset about sentence structure, vowels/consonants, etc...
A path-following dataset with simple rules like "clockwise"/"counter-clockwise".
...not sure what I'd do for continuous input control tasks.