Devise and test a heuristic to infer the IPA language code for a given lang#489
Merged
Devise and test a heuristic to infer the IPA language code for a given lang#489
Conversation
joanise
commented
Apr 24, 2026
|
|
||
| assert ( | ||
| error_count == 0 | ||
| ), f'g2p mapping errors found, look for "{error_prefix}" above for detail.' |
Member
Author
There was a problem hiding this comment.
test_io() above is not changed except for outdenting it from a TestSuite class method to a pytest style test function.
Determining the correct -ipa language code given an input language code is unfortunately not as straightforward as we'd like. This commit adds a proposed heuristic function and tests it, to make sure this heuristic remains 100% correct in the future. Also a number of apparently unrelated changed to please mypy.
c188a89 to
9d2a7c1
Compare
Contributor
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
joanise
added a commit
to EveryVoiceTTS/EveryVoice
that referenced
this pull request
Apr 24, 2026
using the technique documented and tested in NRC-ILT/g2p#489 Fixes #789
joanise
added a commit
to EveryVoiceTTS/EveryVoice
that referenced
this pull request
Apr 24, 2026
using the technique documented and tested in NRC-ILT/g2p#489 Fixes #789
Warrants a minor version bump.
joanise
added a commit
to EveryVoiceTTS/EveryVoice
that referenced
this pull request
Apr 27, 2026
using the technique documented and tested in NRC-ILT/g2p#489 Fixes #789
Member
Author
|
@roedoejet PR updated. Note that critical review of this PR is important, because if we don't like |
joanise
added a commit
to EveryVoiceTTS/EveryVoice
that referenced
this pull request
Apr 27, 2026
using the technique documented and tested in NRC-ILT/g2p#489 Fixes #789
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Goal?
This work is triggered by EveryVoice currently not supporting
sal-apaas an input language, because with get anInvalidLanguageCodeexception fromsal-apa-ipa. The correct IPA code is actuallysal-ipa.We don't have a rule at the moment deterministically saying how to derive the IPA language code from an input language code, but our convention happens to be either
lang_id+"-ipa"or, if not found,lang_id[:3]+"-ipa". The test case added in this PR formalizes that henceforth by asserting it to be so in unit testing.Fixes?
While not actually fixing EveryVoiceTTS/EveryVoice#789, this PR makes sure the solution I'm going to propose for that bug will keep working in the future.
Feedback sought?
careful analysis of
get_ipa_lang_codeintest_langs.py: do you have a better solution?I would have preferred to add a proper function to g2p, but I want my solution to work on past and future versions of g2p, so that's not really possible. Instead, I'm formalizing via unit tests what I'm going to assume is true of the g2p library in EveryVoice.
Priority?
normal
Tests added?
yup
How to test?
Confidence?
medium
Version change?
no