-
Notifications
You must be signed in to change notification settings - Fork 306
LA SELVA BEACH, CA causes parser confusion about the *state* #406
Description
The Input Address
'360 CAM AL BARRANCO, LA SELVA BEACH, CA, 95076'
(ex. 123 Main St. Chicago, Illinois)
Current Output
ERROR: Unable to tag this string because more than one area of the string has the same label
ORIGINAL STRING: 360 CAM AL BARRANCO, LA SELVA BEACH, CA, 95076
PARSED TOKENS: [('360', 'AddressNumber'), ('CAM', 'StreetName'), ('AL', 'StreetNamePostType'), ('BARRANCO,', 'PlaceName'), ('LA', 'StateName'), ('SELVA', 'PlaceName'), ('BEACH,', 'PlaceName'), ('CA,', 'StateName'), ('95076', 'ZipCode')]
UNCERTAIN LABEL: PlaceName
When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly
Expected Output
360 - AddressNumber
CAM AL BARRANCO - StreetName
??. - StreetNamePostType
LA SELVA BEACH - PlaceName
CA - StateName
Examples
- Fail: '360 CAM AL BARRANCO, LA SELVA BEACH, CA, 95076'
- Fail: ''360 CAMINO AL BARRANCO, LA SELVA BEACH, CA, 95076'
- OK: "360 Camino al Barranco, Selva Beach, CA"
Additional context
This is clearly meant to be "360 Camino al Barranco, La Selva Beach, CA", which is here: https://maps.app.goo.gl/bobVwDJjJ6ke5dRFA.
It's throwing a repeated label error, because it's see the "LA" in La Selva Beach as a state
Editorial
- features look correct
- parsing looks correct
I think the problem is the TAGGER.tag(features), which is labeling this as a state.
As an alternative solution, consider giving 'commas' weight while training, or special case 'states' somehow.
Thanks for looking into this!