Skip to content

Consider adding more control to the way that translation is performed on a per-marker basis. #560

@davidbaines

Description

@davidbaines

It would be very helpful to have finer control over the way that SILNLP produces translations. It would be ideal to be able to specify what should happen with the data in each marker or group of markers. The translate_config.yml file might be a good place to configure these settings.

There are these actions that could be considered possible for Paragraph style markers:
Delete: Ignore the marker and its data and omit it from the output.
Translate: Copy the marker to the output along with the translation of its content.
Copy: Copy the marker and text to the output verbatim. Do not attempt to translate - useful for references for example.

Other actions are possible for Character style markers:
Translate without marker: Extract the text from the marker and translate it don't add the marker to the output.
Translate and move marker: Extract the text from the marker and translate it. Add the marker and end marker to the output.
This would have the option of adding the Marker and Endmarker to either the beginning or the end of the paragraph. Or adding the Marker to the beginning of the Paragraph and the End marker to the end of the paragraph.

Every marker has a \StyleType which is one of: Paragraph, Character, Milestone or Note. It might (or might not) be useful to be able to apply one action to all those markers with a specific \StyleType. Although this would likely not be very useful for the Paragraph or Character Styles which are widely used, it could be useful as a way to decide what should happen with Notes and Milestones.

Most, but not all markers have a \TextType which is one of: Title, ChapterNumber, VerseNumber, VerseText, Other, NoteText, Section.
It might be useful to be able to apply one action to all those markers with a specific \TextType

According to the USFM Reference, Markers which would be used in a broader text "environments" were named using a reserved initial letter and rather than an opening and closing tag.
In other words the markers beginning with \i form the introduction. All those beginning with f refer to a given footnote, etc.

Ideally, we would be able to specify what happens to these as a group without having to specify what happens to each individual marker within the group.

\i - Introductions
\f - Footnotes
\x - Cross references
\e - Explanatory (study) material

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestpipeline 6: inferIssue related to using a trained model to translate.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions