-
Notifications
You must be signed in to change notification settings - Fork 6
char_list and char_disamb functions in document.py #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fyang3
commented
May 18, 2021
- added Honorifics
- char_list and char_disamb
- temporarily resolve the proximity/init dependency issue
- added Honorifics - char_list and char_disamb - temporarily resolve the proximity/init dependency issue
MBJean
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, great work! I've left some comments below, including a few things that I think should be changed. Let me know what you think.
| from gender_analysis.analysis.dependency_parsing import * | ||
| from gender_analysis.analysis.dunning import * | ||
| from gender_analysis.analysis.gender_frequency import * | ||
| from gender_analysis.analysis.instance_distance import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note for posterity. This is a temporary measure to prevent circular imports caused by proximity.py importing Corpus for type hinting. PR #163 attempts to address the issue more fundamentally.
| set(self.filter_honr(char_list[j][0]))): | ||
| char_cluster.append(char_list[j]) | ||
| to_return.append(char_cluster) | ||
| return to_return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we talked about in Slack briefly, this method probably requires some thinking through. If I'm reading the test output in 710 correctly, it looks like the disambiguation is overly generous, and I suspect we can figure out a more optimized way to traverse those character lists. Let's chat through some issues in office hours.
added rough draft for coref_resolution draft for simple HCI-console-based approach for disambiguate