-
-
Notifications
You must be signed in to change notification settings - Fork 6
Clean projects #731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Clean projects #731
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @mshannon-sil)
silnlp/common/clean_projects.py
line 412 at r1 (raw file):
all_folders.append(item) test = True
Was this supposed to be included? Or was it left over from debugging/testing?
I've removed the test code and also put "TermRenderings.xml" in lower case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @davidbaines and @mshannon-sil)
silnlp/common/clean_projects.py
line 412 at r1 (raw file):
Previously, benjaminking (Ben King) wrote…
Was this supposed to be included? Or was it left over from debugging/testing?
I believe the "test" variable should also be removed since it's no longer used.
Thanks Ben good catch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @mshannon-sil)
silnlp/common/clean_projects.py
line 344 at r3 (raw file):
# --- Configure Logging --- #log_formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s") log_formatter = logging.Formatter("2025-05-29 14:30:00,000 - %(levelname)s - %(message)s")
One last thing: the hard-coded date in the format string here will cause all of the log messages to print "2025-05-29"
Thanks, Ben. I've reinstated the correct log Formatter. |
@mshannon-sil I think that I've made all the requested changes - are you able to review this too while Ben is away? |
Yes, I'll add my review shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments reviewing the file as a whole rather than just your changes, since I found a few issues and this PR is aiming to improve the clean_projects module.
Reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, 9 unresolved discussions (waiting on @benjaminking and @davidbaines)
silnlp/common/clean_projects.py
line 77 at r4 (raw file):
"frtbak.sty", "wordanalyses.xml", "bookNames.xml",
Shouldn't this be all lower case to match the other filenames here?
silnlp/common/clean_projects.py
line 87 at r4 (raw file):
".dic", ".ldml", ".lds",
This is in both extensions to keep and delete. Is that intended?
silnlp/common/clean_projects.py
line 167 at r4 (raw file):
if self.args.verbose > 0: # Condition to buffer this warning self._log_info(warning_msg) self.parsing_errors.append(f"BiblicalTermsListSetting file not found: {self.project_settings.biblical_terms_file_name})")
There's an extra parenthesis at the end of the string.
silnlp/common/clean_projects.py
line 223 at r4 (raw file):
delete_file = True reason = "specific name" elif any(item_path.match(pattern) for pattern in FILES_TO_DELETE_BY_PATTERN):
I think this should also compare against the lower case version of the item_path
silnlp/common/clean_projects.py
line 350 at r4 (raw file):
if args.verbose == 0: console_handler.setLevel(logging.CRITICAL + 1) elif args.verbose == 1:
The elif and else statement both do the same thing here.
silnlp/common/clean_projects.py
line 375 at r4 (raw file):
# Initial scan for all items to determine directories initial_items = list(projects_root_path.glob("*"))
glob("*")
doesn't include folders/files that start with a dot e.g. .cache
. You might want to do projects_root_path.listdir()
to get a list of all items instead.
silnlp/common/clean_projects.py
line 388 at r4 (raw file):
found_total_msg = f"Found {len(all_folders)} total directories in {args.projects_root}." logger.info(found_total_msg) if args.verbose > 0:
found_total_message
is being logged/printed twice here.
silnlp/common/clean_projects.py
line 423 at r4 (raw file):
found_msg = f"Found {len(project_folders)} project folders." logger.info(found_msg) if args.verbose > 0:
found_msg
is also logged/printed twice here. And there are multiple other occurrences of the same issue in this file so it would be good to look at every case of args.verbose > 0
to check.
Updates to common.clean_projects to use multi threading and improvements to the logging.
This change is