第三章BPE(Byte Pair Encoding)算法解决了什么问题? #158
Unanswered
RavenCaffeine
asked this question in
💬 Exercises & Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
解决 “单词分词的词表爆炸与未登录词(OOV)问题” 和 “字符分词的语义稀疏与学习效率低问题”,最终实现 “词表大小可控、语义表达高效、泛化能力强” 的目标。
BPE 算法的本质是 “通过贪心合并高频词对,构建兼顾‘完整常见词’和‘可拆分低频词’的子词表”,最终实现三大平衡:词表大小可控(避免爆炸)、语义表达高效(避免稀疏)、泛化能力强(处理新词与变体)
Beta Was this translation helpful? Give feedback.
All reactions