here
|
void SentencePiece::set_vocabulary(const std::vector<std::string>& vocabulary, |
|
const Tokenizer::Options* options) |
|
{ |
|
if (options && (options->joiner_annotate || options->spacer_new)) |
|
throw std::invalid_argument("SentencePiece vocabulary restriction requires the tokenization " |
|
"to use \"spacer_annotate\" (same as spm_encode)"); |
|
auto status = _processor->SetVocabulary(vocabulary); |
either a quick conversion
auto status = _processor->SetVocabulary(ToPieceArray(vocabulary));
is needed or possible the string_view should switch should be propagated up the abstractions.
ToPieceArray was added in the same release that made this switch to views.
google/sentencepiece@631420b#diff-77e6a3b3bfda73d84fe1fef8205f2a2ec1d46b8f232100041f7135505f8adcefR52