-
-
Notifications
You must be signed in to change notification settings - Fork 6
Improved onboard_project.py #783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
-Added collect_verse_counts to onboard_project -Replaced command line args with a yaml config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like what you've done with this. As the onboarding script gets more complex, I think it will be important to make sure we're logging messages that describe the actions that the script is taking, at a high level at least. For example, if the configuration file says to collect verse count, then having a message that says "Collecting verse counts" or something similar.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ddaspit)
silnlp/common/onboard_project.py
line 92 at r1 (raw file):
copy_paratext_project_folder(Path(args.copy_from), paratext_project_dir, overwrite=args.overwrite) args = parser.parse_args()
I don't think you should need to parse the arguments again.
silnlp/common/onboard_project.py
line 98 at r1 (raw file):
paratext_project_dir: Path = create_paratext_project_folder_if_not_exists(project_name) if args.copy_from:
Maybe we should add an error if the user doesn't include either of copy_from
or config
, since the script will say it's onboarding the project, but then only create a new folder at the most.
silnlp/common/onboard_project.py
line 106 at r1 (raw file):
raise FileNotFoundError(f"Configuration file '{config_file}' does not exist.") with config_file.open("r", encoding="utf-8") as file: config = yaml.safe_load(file)
Now that we have a config file, I think we should start a page on the wiki (Home · sillsdev/silnlp Wiki) that we keep up to date as we add new configuration options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now added some messages at a high level, extract_corpora and collect_verse_counts will both print more messages as they execute.
Reviewable status: 2 of 3 files reviewed, 3 unresolved discussions (waiting on @benjaminking and @ddaspit)
silnlp/common/onboard_project.py
line 92 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
I don't think you should need to parse the arguments again.
I don't know how I did not catch these repeated lines, I've now removed all the copied lines.
silnlp/common/onboard_project.py
line 98 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
Maybe we should add an error if the user doesn't include either of
copy_from
orconfig
, since the script will say it's onboarding the project, but then only create a new folder at the most.
I've now added some error messages if they are not included.
silnlp/common/onboard_project.py
line 106 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
Now that we have a config file, I think we should start a page on the wiki (Home · sillsdev/silnlp Wiki) that we keep up to date as we add new configuration options.
I'll start working on that now.
-Made --extract-corpora and --collect-verse-counts args without requiring a config, using default values to run both tasks.
@benjaminking I created a Wiki page here. I also made some changes to do the following:
I think changing the top level tasks to args makes more intuitive sense, rather than always requiring a config when the default settings are okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll probably need to have Damien take over reviewing this while I'm gone. We're pretty close here though.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @ddaspit)
silnlp/common/onboard_project.py
line 123 at r2 (raw file):
config = {} if args.extract_corpora:
I expect that the EITL team will want to have a standard config for onboarding projects and will just re-use it for each new project they onboard. What do you think about keeping extract_corpora
and select_verse_counts
in the config, but allowing the command line options to override what's in the config?
silnlp/common/onboard_project.py
line 139 at r2 (raw file):
if args.collect_verse_counts: if not args.extract_corpora:
Let me know if what I'm thinking isn't feasible, but could we check whether the files that we need for collect_verse_counts
are present already? The scenario that I'm imagining is that someone runs this script, but later realizes that they forgot to collect verse counts. Would it be possible for them to run it again with only collect_verse_counts
enabled (and overwrite
equal to false)?
-Added collect_verse_counts to onboard_project
-Replaced command line args with a yaml config
This change is