Allow for proper training on new data in pypi by xmedr · Pull Request #387 · datamade/usaddress

xmedr · 2025-03-20T21:33:59Z

Overview

In order to get the package to actually train the model on our multi word state data, we needed to append that data to the end of the canonical data. This branch does that and adjusts the instructions in the README to reflect this.

Connects https://github.com/datamade/parserator.datamade.us/issues/189

Notes

After this comes in I'll push a new version number tag to main to finish the deployment.

Testing

After pulling down this branch, open a virtual environment and install this version of usaddress
- pip install -e ."[dev]"
- It may be useful to uninstall any previous version in your virtual env beforehand, if one exists
Train the model on just the labeled.xml file
- parserator train training/labeled.xml usaddress
Open a python shell and try to parse some addresses with two word state names
- I used some samples from Zillow

>>> import usaddress
>>> usaddress.parse("6774 68th St S, Horace, North Dakota 58047")
[('6774', 'AddressNumber'), ('68th', 'StreetName'), ('St', 'StreetNamePostType'), ('S,', 'StreetNamePostDirectional'), ('Horace,', 'PlaceName'), ('North', 'StateName'), ('Dakota', 'StateName'), ('58047', 'ZipCode')]

Xavier Medrano added 3 commits March 20, 2025 17:19

append train/test data to end of appropriate files

00f3c08

adjust instructions for adding data

3080968

bump version number

2198b1d

xmedr marked this pull request as ready for review March 20, 2025 21:36

xmedr requested a review from derekeder March 20, 2025 21:36

derekeder approved these changes Mar 21, 2025

View reviewed changes

xmedr merged commit e7a9055 into main Mar 24, 2025
34 checks passed

xmedr deleted the fix/train-data branch March 24, 2025 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for proper training on new data in pypi#387

Allow for proper training on new data in pypi#387
xmedr merged 3 commits intomainfrom
fix/train-data

xmedr commented Mar 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xmedr commented Mar 20, 2025

Overview

Notes

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants