Skip to content

Common Key Terms Implementation for silnlp and Serval #850

@Enkidu93

Description

@Enkidu93

(From Michael)
Serval and silnlp rely on different implementations of the key terms logic. Differences have been identified in a number of areas, including:

  • Pairing KT glosses (from Paratext datasets) and project-specific KT renderings
  • Adding key term pairs to the training data (silnlp adds each KT twice; Serval adds each KT once).
  • Extracting key term renderings from the project data.
  • Supported gloss languages (Serval supports English, Spanish, French, Indonesian and Portuguese, while silnlp does not support Portuguese).
    A single shared implementation would simplify maintenance and enhancement of this feature as well as user support.

The most natural place for a common implementation is Machine. This will require updates to machine.py, Machine, silnlp, and Serval. Broadly, we need to:

  • Expand machine.py's key term handling (extraction and alignment) to be consistent with/as flexible as silnlp's current behavior as needed.
  • Replace silnlp's key term handling with machine.py's
  • Port machine.py updates to Machine (C#)
  • Make updates to Serval and machine.py's build_job classes as needed to use those updates

We should establish a small collection of key terms situations/projects that we can use to test consistency throughout this process.

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

🔖 Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions