- Retrieve tokenized results from a dataset instead of adding to the repo. - Maybe create script to regenerate the reference baseline. - Add more cases. - Verify edge cases.