Skip to content

Ignore remarks and empty end-of-verse paragraphs #757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

isaac091
Copy link
Collaborator

@isaac091 isaac091 commented Jun 19, 2025

Since the draft remark only gets added to the draft, the vrefs were not aligned between the two files.


This change is Reviewable

@isaac091 isaac091 requested a review from benjaminking June 19, 2025 17:41
Copy link
Collaborator

@benjaminking benjaminking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @isaac091)


silnlp/common/compare_usfm_structure.py line 103 at r1 (raw file):

    tokenizer = WhitespaceMarkerTokenizer()

    gold_file_sents = [

It might be helpful to either add a comment or split this out into a separate method with a descriptive name, since what it's doing is a little unintuitive.

@isaac091 isaac091 changed the title Handle draft remarks Ignore remarks and empty end-of-verse paragraphs Jun 24, 2025
@isaac091
Copy link
Collaborator Author

I added another commit that is completely different from the original one but that is doing a similar type of thing as far as excluding elements that we don't want to factor into the score. A recent update to the USFM parser has it include the empty end-of-verse paragraph markers in row outputs, and that was artificially inflating the similarity scores.

@isaac091 isaac091 requested a review from benjaminking June 24, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants