Skip to content

Translate script doesn't support USFM file names using the "<nn>-<book>" naming format #456

@mmartin9684-sil

Description

@mmartin9684-sil

Paratext-compatible projects downloaded from Door43 (e.g., https://git.door43.org/unfoldingWord/el-x-koine_ugnt) use the "-" book name format for naming the USFM files in the project. For example:

  • 01-GEN.usfm
  • 02-EXO.usfm
  • 03-LEV.usfm
  • ...

When the translate script is run with one of these projects as the source projects, the script will error out because it doesn't properly handle this book naming format:

2024-07-13 07:58:52,216 - silnlp.nmt.translate - ERROR - Was not able to translate MIC.
Traceback (most recent call last):
  File "/root/.clearml/venvs-builds/3.8/task_repository/silnlp.git/silnlp/nmt/translate.py", line 122, in translate_books
    translator.translate_book(
  File "/root/.clearml/venvs-builds/3.8/task_repository/silnlp.git/silnlp/common/translator.py", line 317, in translate_book
    raise RuntimeError(f"Can't find file {book_path} for book {book}")
RuntimeError: Can't find file /tmp/tmpu35zij1b/hbo_uhb_2024_07_10/33MIC.usfm for book MIC

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestpipeline 6: inferIssue related to using a trained model to translate.

    Type

    No type

    Projects

    Status

    📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions