-
-
Notifications
You must be signed in to change notification settings - Fork 6
Add support for multiple postprocessing requests #759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @isaac091)
silnlp/common/postprocess_draft.py
line 186 at r1 (raw file):
if args.experiment: LOGGER.info("No postprocessing options used. Applying postprocessing requests from translate config.") with (config.exp_dir / "translate_config.yml").open("r", encoding="utf-8") as file:
I believe it's possible for a valid experiment folder not to contain a translate_config.yml file. If that's correct, you probably should have a check that the file exists.
silnlp/common/postprocess_draft.py
line 189 at r1 (raw file):
postprocess_configs = yaml.safe_load(file).get("postprocess", []) if len(postprocess_configs) == 0: LOGGER.info("No postprocessing requests found.")
Can you make this message a little more specific?
silnlp/common/postprocess_draft.py
line 248 at r1 (raw file):
else UpdateUsfmMarkerBehavior.STRIP ) marker_placement_suffix = (
Reading through this, I'm thinking that this is becoming sufficiently complex to be worth factoring out a class with the postprocess_config logic. It could handle the mapping from the property names to the enum values as well, creating the file suffix, and creating the draft remarks so that you don't have to duplicate code between here and translator.py.
This is ready to be reviewed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 9 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @isaac091)
silnlp/common/postprocesser.py
line 58 at r4 (raw file):
class PostprocessHandler: def __init__(self, configs: List[Dict[str, Union[bool, str]]] = [], include_base: bool = True) -> None:
I think it might be better if this class could be decoupled from the way that postprocess configs are specified in translate_config.yml
. Specifically, I'm thinking of having the constructor take a list of PostprocessConfig
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 9 files reviewed, 4 unresolved discussions (waiting on @benjaminking)
silnlp/common/postprocesser.py
line 58 at r4 (raw file):
Previously, benjaminking (Ben King) wrote…
I think it might be better if this class could be decoupled from the way that postprocess configs are specified in
translate_config.yml
. Specifically, I'm thinking of having the constructor take a list ofPostprocessConfig
s.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r5, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @isaac091)
This reworks how translate configs are expected to look. Instead of specifying the postprocessing options inside of each translate request, there will be a separate 'postprocess' section at the top level which is a list of postprocessing requests that will be applied to all translate requests. A draft with no postprocessing applied is always saved. Once this is pushed, I will update the wiki with an example and explanation.
Example:
It also changes the behavior of
postprocess_draft.py
slightly. Now, if none of the postprocessing options are toggled and the--experiment
option is used, the script will apply any postprocessing requests in the experiment's translate config. Previously, if no postprocessing options were used, it would create a base draft, but that is no longer necessary because they are created by default, and I can foresee folks expecting to be able to use the translate config to configure this script. I will also update the wiki to reflect these changes.This PR is ready to be reviewed, but not to push right away. I still need to get feedback on what the names of the output files should look like. Currently, the base draft will have the same file name as drafts do now, and any outputs with marker placement options will have a suffix, e.g.
41MAT_pse.SFM
, that indicates ifp
aragraph markers,s
tyle markers, ore
mbeds were included.Closes #746
This change is