Skip to content

Allow for proper training on new data in pypi#387

Merged
xmedr merged 3 commits intomainfrom
fix/train-data
Mar 24, 2025
Merged

Allow for proper training on new data in pypi#387
xmedr merged 3 commits intomainfrom
fix/train-data

Conversation

@xmedr
Copy link
Copy Markdown
Contributor

@xmedr xmedr commented Mar 20, 2025

Overview

In order to get the package to actually train the model on our multi word state data, we needed to append that data to the end of the canonical data. This branch does that and adjusts the instructions in the README to reflect this.

Notes

After this comes in I'll push a new version number tag to main to finish the deployment.

Testing

  • After pulling down this branch, open a virtual environment and install this version of usaddress
    • pip install -e ."[dev]"
    • It may be useful to uninstall any previous version in your virtual env beforehand, if one exists
  • Train the model on just the labeled.xml file
    • parserator train training/labeled.xml usaddress
  • Open a python shell and try to parse some addresses with two word state names
    • I used some samples from Zillow
>>> import usaddress
>>> usaddress.parse("6774 68th St S, Horace, North Dakota 58047")
[('6774', 'AddressNumber'), ('68th', 'StreetName'), ('St', 'StreetNamePostType'), ('S,', 'StreetNamePostDirectional'), ('Horace,', 'PlaceName'), ('North', 'StateName'), ('Dakota', 'StateName'), ('58047', 'ZipCode')]

@xmedr xmedr marked this pull request as ready for review March 20, 2025 21:36
@xmedr xmedr requested a review from derekeder March 20, 2025 21:36
@xmedr xmedr merged commit e7a9055 into main Mar 24, 2025
34 checks passed
@xmedr xmedr deleted the fix/train-data branch March 24, 2025 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants