-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
enhancementNew feature or requestNew feature or requestfeature 🪛help wantedExtra attention is neededExtra attention is needed
Description
Summary
location4j works for 95% of situations however the current implementation fails when there are ambiguities.
Examples are:
- New York state / New York city (Alabama)
- Mexico country / mexico ciity
This results in unintended results
Possible solutions:
- Score results with most popular results first
How do you decided what is popular?
- Add an ambigious flag to certain results to then prompt some additional logic to check for all clashes in names
Best Solution:
Rewrite the matching logic when finding matches with tokenised text. This logic finds all matching combinations and then scores them retrospectively
example scenario:
Input: New York NY
Psuedo code:
1. Split into tokens ["New York", "York NY", "New", "York", "NY"]
2. Find match for New York in city, state
3. Add those matches to results
4. Find additional matches to "NY" and "York"
5. NY matches an existing state match already, increase score
6. New York in Alabama does not have the same state code as NY, decrease score
7. York is not related to New York, decrease score (or maybe just leave score as is)
8. Sort scores
9. New York, NY should be returned first
Examples:
search("New York, NY") returns New York, New York, USA
search("Mexico") returns Mexico (country) as top result
search("Paris, TX") returns Paris, Texas, USA (not Paris, France)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestfeature 🪛help wantedExtra attention is neededExtra attention is needed