Skip to content

Add support for multiple postprocessing requests #759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

isaac091
Copy link
Collaborator

@isaac091 isaac091 commented Jun 20, 2025

This reworks how translate configs are expected to look. Instead of specifying the postprocessing options inside of each translate request, there will be a separate 'postprocess' section at the top level which is a list of postprocessing requests that will be applied to all translate requests. A draft with no postprocessing applied is always saved. Once this is pushed, I will update the wiki with an example and explanation.

Example:

translate:
  - src_project: NASB
    books: MAT
  - src_project: NIV11R
    books: MAT
postprocess:
  - include_paragraph_markers: True  # only paragraph markers
  - include_paragraph_markers: True  # all markers
    include_style_markers: True
    include_embeds: True

It also changes the behavior of postprocess_draft.py slightly. Now, if none of the postprocessing options are toggled and the --experiment option is used, the script will apply any postprocessing requests in the experiment's translate config. Previously, if no postprocessing options were used, it would create a base draft, but that is no longer necessary because they are created by default, and I can foresee folks expecting to be able to use the translate config to configure this script. I will also update the wiki to reflect these changes.

This PR is ready to be reviewed, but not to push right away. I still need to get feedback on what the names of the output files should look like. Currently, the base draft will have the same file name as drafts do now, and any outputs with marker placement options will have a suffix, e.g. 41MAT_pse.SFM, that indicates if paragraph markers, style markers, or embeds were included.

Closes #746


This change is Reviewable

@isaac091 isaac091 requested a review from benjaminking June 20, 2025 05:01
Copy link
Collaborator

@benjaminking benjaminking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @isaac091)


silnlp/common/postprocess_draft.py line 186 at r1 (raw file):

        if args.experiment:
            LOGGER.info("No postprocessing options used. Applying postprocessing requests from translate config.")
            with (config.exp_dir / "translate_config.yml").open("r", encoding="utf-8") as file:

I believe it's possible for a valid experiment folder not to contain a translate_config.yml file. If that's correct, you probably should have a check that the file exists.


silnlp/common/postprocess_draft.py line 189 at r1 (raw file):

                postprocess_configs = yaml.safe_load(file).get("postprocess", [])
            if len(postprocess_configs) == 0:
                LOGGER.info("No postprocessing requests found.")

Can you make this message a little more specific?


silnlp/common/postprocess_draft.py line 248 at r1 (raw file):

            else UpdateUsfmMarkerBehavior.STRIP
        )
        marker_placement_suffix = (

Reading through this, I'm thinking that this is becoming sufficiently complex to be worth factoring out a class with the postprocess_config logic. It could handle the mapping from the property names to the enum values as well, creating the file suffix, and creating the draft remarks so that you don't have to duplicate code between here and translator.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generating multiple drafts of a book with different draft formatting settings in one experiment
2 participants