Skip to content

[IMPROVEMENT] Redesign location4j #55

@tomaytotomato

Description

@tomaytotomato

Summary

location4j works for 95% of situations however the current implementation fails when there are ambiguities.

Examples are:

  • New York state / New York city (Alabama)
  • Mexico country / mexico ciity

This results in unintended results

Possible solutions:

  1. Score results with most popular results first

How do you decided what is popular?

  1. Add an ambigious flag to certain results to then prompt some additional logic to check for all clashes in names

Best Solution:

Rewrite the matching logic when finding matches with tokenised text. This logic finds all matching combinations and then scores them retrospectively

example scenario:

Input: New York NY

Psuedo code:

1. Split into tokens ["New York", "York NY", "New", "York", "NY"]
2. Find match for New York in city, state 
3. Add those matches to results
4. Find additional matches to "NY" and "York"
5. NY matches an existing state match already, increase score
6. New York in Alabama does not have the same state code as NY, decrease score
7. York is not related to New York, decrease score  (or maybe just leave score as is)
8. Sort scores
9. New York, NY should be returned first

Examples:

search("New York, NY") returns New York, New York, USA
search("Mexico") returns Mexico (country) as top result
search("Paris, TX") returns Paris, Texas, USA (not Paris, France)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions