OOV tokens are deleted by English g2p conversion.

### Bug description

If your text includes OOVS, like digits, typos or unknown words (where unknown words are those not in the CMU dict use to build the g2p English mapping), they are simply stripped out of the text before training or synthesis if you are converting to phones.

E.g., `testing 123 testings test` gets g2p'd to `tɛstɪŋ   tɛst` which is not great for training, and potentially catastrophic for synthesis.

When a given utterance constitutes exclusively of OOVs, you get a stack dump as described in #741.

This problem was noticed by @marctessier a few weeks ago.

Possible suggestions by @roedoejet
 - fall back to und, like readalongs does?
 - fall back to a neural g2p model?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOV tokens are deleted by English g2p conversion. #751

Bug description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OOV tokens are deleted by English g2p conversion. #751

Description

Bug description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions