Skip to content

Add training data for multi word states#386

Merged
xmedr merged 4 commits intomainfrom
feat/multi-word-states
Mar 17, 2025
Merged

Add training data for multi word states#386
xmedr merged 4 commits intomainfrom
feat/multi-word-states

Conversation

@xmedr
Copy link
Copy Markdown
Contributor

@xmedr xmedr commented Mar 14, 2025

Overview

The parser was having trouble labelling states like Rhode Island and New Jersey. We've added some training data for that.

Testing

  • Open a virtual environment and download the latest version of usaddress
    • pip install usaddress --upgrade
  • Open a python shell and try to parse some addresses with two word state names
    • I used some samples from Zillow
>>> import usaddress
>>> usaddress.parse("6774 68th St S, Horace, North Dakota 58047")
[('6774', 'AddressNumber'), ('68th', 'StreetName'), ('St', 'StreetNamePostType'), ('S,', 'StreetNamePostDirectional'), ('Horace,', 'PlaceName'), ('North', 'StateName'), ('Dakota', 'StateName'), ('58047', 'ZipCode')]

@xmedr xmedr requested a review from derekeder March 14, 2025 15:53
Copy link
Copy Markdown
Member

@derekeder derekeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. i had to take a few additional steps to get it working locally with the latest training data.

this ended up working for me:

pip install setuptools
python setup.py develop  
parserator train training/multi_word_state_addresses.xml usaddress 

Comment thread README.md
git clone https://github.com/datamade/usaddress.git
cd usaddress
pip install -e .[dev]
pip install -e ."[dev]"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this didn't work without quotes. see mu-editor/mu#852

@xmedr xmedr merged commit 7f8d1f8 into main Mar 17, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants