You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Early in this year, I wrote an new implementation for BPE Algorithm in pure python, which is faster than the version in Tokenizer.
I hope this implementation could help tokenizers to further improve the BPE training performance.
I have writen a blog in Chinese about this implementation. I will try to translate it to English if there is any need. By the way, the code is quite short in my opinion, with about merely 400 lines.