Skip to content

Clean projects #731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Clean projects #731

wants to merge 9 commits into from

Conversation

davidbaines
Copy link
Collaborator

@davidbaines davidbaines commented May 27, 2025

Updates to common.clean_projects to use multi threading and improvements to the logging.


This change is Reviewable

Copy link
Collaborator

@benjaminking benjaminking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @mshannon-sil)


silnlp/common/clean_projects.py line 412 at r1 (raw file):

            all_folders.append(item)

    test = True

Was this supposed to be included? Or was it left over from debugging/testing?

@davidbaines
Copy link
Collaborator Author

I've removed the test code and also put "TermRenderings.xml" in lower case.

Copy link
Collaborator

@benjaminking benjaminking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @davidbaines and @mshannon-sil)


silnlp/common/clean_projects.py line 412 at r1 (raw file):

Previously, benjaminking (Ben King) wrote…

Was this supposed to be included? Or was it left over from debugging/testing?

I believe the "test" variable should also be removed since it's no longer used.

@davidbaines
Copy link
Collaborator Author

Thanks Ben good catch.

Copy link
Collaborator

@benjaminking benjaminking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @mshannon-sil)


silnlp/common/clean_projects.py line 344 at r3 (raw file):

    # --- Configure Logging ---
    #log_formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")
    log_formatter = logging.Formatter("2025-05-29 14:30:00,000 - %(levelname)s - %(message)s")

One last thing: the hard-coded date in the format string here will cause all of the log messages to print "2025-05-29"

@davidbaines
Copy link
Collaborator Author

Thanks, Ben. I've reinstated the correct log Formatter.
I can see other improvements would be beneficial, such as reading the .env file rather than having a hard coded path, but I'll add those to another PR to avoid complicating the review on this one.

@davidbaines
Copy link
Collaborator Author

@mshannon-sil I think that I've made all the requested changes - are you able to review this too while Ben is away?

@mshannon-sil
Copy link
Collaborator

Yes, I'll add my review shortly.

Copy link
Collaborator

@mshannon-sil mshannon-sil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments reviewing the file as a whole rather than just your changes, since I found a few issues and this PR is aiming to improve the clean_projects module.

Reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, 9 unresolved discussions (waiting on @benjaminking and @davidbaines)


silnlp/common/clean_projects.py line 77 at r4 (raw file):

    "frtbak.sty",
    "wordanalyses.xml",
    "bookNames.xml",

Shouldn't this be all lower case to match the other filenames here?


silnlp/common/clean_projects.py line 87 at r4 (raw file):

    ".dic",
    ".ldml",
    ".lds",

This is in both extensions to keep and delete. Is that intended?


silnlp/common/clean_projects.py line 167 at r4 (raw file):

                if self.args.verbose > 0:  # Condition to buffer this warning
                    self._log_info(warning_msg)
                self.parsing_errors.append(f"BiblicalTermsListSetting file not found: {self.project_settings.biblical_terms_file_name})")

There's an extra parenthesis at the end of the string.


silnlp/common/clean_projects.py line 223 at r4 (raw file):

                delete_file = True
                reason = "specific name"
            elif any(item_path.match(pattern) for pattern in FILES_TO_DELETE_BY_PATTERN):

I think this should also compare against the lower case version of the item_path


silnlp/common/clean_projects.py line 350 at r4 (raw file):

    if args.verbose == 0:
        console_handler.setLevel(logging.CRITICAL + 1)
    elif args.verbose == 1:

The elif and else statement both do the same thing here.


silnlp/common/clean_projects.py line 375 at r4 (raw file):

    # Initial scan for all items to determine directories
    initial_items = list(projects_root_path.glob("*"))

glob("*") doesn't include folders/files that start with a dot e.g. .cache. You might want to do projects_root_path.listdir() to get a list of all items instead.


silnlp/common/clean_projects.py line 388 at r4 (raw file):

    found_total_msg = f"Found {len(all_folders)} total directories in {args.projects_root}."
    logger.info(found_total_msg)
    if args.verbose > 0:

found_total_message is being logged/printed twice here.


silnlp/common/clean_projects.py line 423 at r4 (raw file):

    found_msg = f"Found {len(project_folders)} project folders."
    logger.info(found_msg)
    if args.verbose > 0:

found_msg is also logged/printed twice here. And there are multiple other occurrences of the same issue in this file so it would be good to look at every case of args.verbose > 0 to check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants