Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions measure_performance/test_data/labeled.xml
Original file line number Diff line number Diff line change
Expand Up @@ -121,4 +121,9 @@
<AddressString><AddressNumber>400</AddressNumber> <StreetName>Calm</StreetName> <StreetName>Lake</StreetName> <StreetNamePostType>Circle,</StreetNamePostType> <PlaceName>Rochester,</PlaceName> <StateName>NY,</StateName> <ZipCode>14612</ZipCode></AddressString>
<AddressString><AddressNumber>37</AddressNumber> <StreetName>Jefferson</StreetName> <StreetNamePostType>Crt,</StreetNamePostType> <PlaceName>Fairport,</PlaceName> <StateName>NY</StateName></AddressString>
<AddressString><AddressNumber>4</AddressNumber> <StreetName>Cypress</StreetName> <StreetNamePostType>Ci,</StreetNamePostType> <PlaceName>Fairport,</PlaceName> <StateName>NY</StateName></AddressString>
<AddressString><AddressNumber>1646</AddressNumber> <StreetName>Red</StreetName> <StreetName>Leaf</StreetName> <StreetNamePostType>Drive</StreetNamePostType> <PlaceName>Fort</PlaceName> <PlaceName>Mill,</PlaceName> <StateName>South</StateName> <StateName>Carolina</StateName> <ZipCode>29715</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>15</AddressNumber> <StreetName>Bridge</StreetName> <StreetNamePostType>Street</StreetNamePostType> <PlaceName>Providence,</PlaceName> <StateName>Rhode</StateName> <StateName>Island</StateName> <ZipCode>02903</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>150</AddressNumber> <StreetName>Citizens</StreetName> <StreetNamePostType>Circle</StreetNamePostType> <PlaceName>Little</PlaceName> <PlaceName>River,</PlaceName> <StateName>South</StateName> <StateName>Carolina</StateName> <ZipCode>29566</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>4079</AddressNumber> <StreetNamePreType>U.S.</StreetNamePreType> <StreetName>17</StreetName> <StreetName>Business</StreetName> <PlaceName>Murrells</PlaceName> <PlaceName>Inlet,</PlaceName> <StateName>South</StateName> <StateName>Carolina</StateName> <ZipCode>29576</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>43</AddressNumber> <StreetNamePreDirectional>South</StreetNamePreDirectional> <StreetName>Broadway</StreetName> <PlaceName>Pitman,</PlaceName> <StateName>New</StateName> <StateName>Jersey</StateName> <ZipCode>08071</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
</AddressCollection>
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "usaddress"
version = "0.5.12"
version = "0.5.13"
description = "Parse US addresses using conditional random fields"
readme = "README.md"
license = {text = "MIT License", url = "http://www.opensource.org/licenses/mit-license.php"}
Expand Down
12 changes: 8 additions & 4 deletions training/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ The labeling program will launch, and if usaddress can suggest the proper labels
But it's not good enough to confirm that usaddress has learned new patterns – you also need to confirm that it hasn't *unlearned* old patterns in the process of incorporating your new training data. To do that, run the usaddress testing suite with the following command:

```
nosetests .
pytest
```

The output will fill your screen with a big block of dots (.) and/or Fs (F). Each dot corresponds to a test that *passed* (meaning that usaddress produced the expected parse for an address) while each F corresponds to a test that *failed* (meaning that usaddress failed to properly parse the address).
Expand All @@ -276,7 +276,7 @@ Ran 4896 tests in 2.158s
OK
```

Congratulations! The model has officially improved. You can safely move on to step 5b, where you'll share your work.
Congratulations! The model has officially improved. You can safely move on to step 5b, where you'll get your work ready to be shared.

If any of our tests failed, however, things become more complicated. The output will break down the tests that failed, showing you the parse that the model produced (labeled `pred`) and the parse that the test expected (labeled `true`). In this case, jump to step 5a to debug your errors.

Expand All @@ -290,9 +290,13 @@ Take the failing addresses and try to find real-world addresses that match the p

Once all of the tests are passing, you're safe to move on to step 5b.

**5b. Make a pull request.**
**5b. Add your training and testing data.**

If you've arrived at this step, it means that all of your new and old tests passed and your model is good to go. Fantastic!
If you've arrived at this step, it means that all of your new and old tests passed and your model is good to go. Fantastic! Next up in order to have the public package trained and tested on your data, you'll need to add it to the canonical data.

To do this, just copy your everything within the `<AddressCollection>` tags of your `new_addresses.xml` file, and paste it towards the end of the same tags within the `labeled.xml` file found in the `training/` directory. Repeat the same steps for the testing data and the `test_data/` directory.

**5c. Make a pull request.**

Now it's time to share your work. GitHub provides a powerful way of sharing code through the *pull request* feature (and has a [really nice guide](https://help.github.com/articles/creating-a-pull-request/) for first-timers explaining how it works). Open up a new pull request and give us a short description of what you changed: What address patterns did you fix? Where did you store your training data? How many new examples/tests did you add? The clearer your description of your work, the easier it will be for the DataMade team to determine whether it's ready to go.

Expand Down
8 changes: 8 additions & 0 deletions training/labeled.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1497,4 +1497,12 @@
<AddressString><AddressNumber>W206N</AddressNumber> <AddressNumber>W16282</AddressNumber> <StreetName>Marashland</StreetName> <StreetNamePostType>Dr,</StreetNamePostType> <PlaceName>Jackson,</PlaceName> <StateName>WI</StateName> <ZipCode>53037</ZipCode></AddressString>
<AddressString><AddressNumber>N170</AddressNumber> <AddressNumber>W20015</AddressNumber> <StreetName>Hunters</StreetName> <StreetNamePostType>Rd,</StreetNamePostType> <PlaceName>Jackson,</PlaceName> <StateName>WI</StateName> <ZipCode>53037</ZipCode></AddressString>
<AddressString><AddressNumber>N170W20015</AddressNumber> <StreetName>Hunters</StreetName> <StreetNamePostType>Rd,</StreetNamePostType> <PlaceName>Jackson,</PlaceName> <StateName>WI</StateName> <ZipCode>53037</ZipCode></AddressString>
<AddressString><AddressNumber>84</AddressNumber> <StreetName>Social</StreetName> <StreetNamePostType>Street</StreetNamePostType> <PlaceName>Woonsocket,</PlaceName> <StateName>Rhode</StateName> <StateName>Island</StateName> <ZipCode>02895</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>3481</AddressNumber> <StreetName>Kingstown</StreetName> <StreetNamePostType>Road</StreetNamePostType> <PlaceName>South</PlaceName> <PlaceName>Kingstown,</PlaceName> <StateName>Rhode</StateName> <StateName>Island</StateName> <ZipCode>02892</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>209</AddressNumber> <StreetName>4th</StreetName> <StreetNamePostType>Avenue</StreetNamePostType> <PlaceName>Asbury</PlaceName> <PlaceName>Park,</PlaceName> <StateName>New</StateName> <StateName>Jersey</StateName> <ZipCode>07712</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>600</AddressNumber> <StreetNamePreDirectional>E</StreetNamePreDirectional> <StreetName>Boulevard</StreetName> <StreetNamePostType>Ave,</StreetNamePostType> <SubaddressType>Dept</SubaddressType> <SubaddressIdentifier>301</SubaddressIdentifier> <PlaceName>Bismarck,</PlaceName> <StateName>North</StateName> <StateName>Dakota</StateName> <ZipCode>58505</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>510</AddressNumber> <StreetNamePreType>U.S.</StreetNamePreType> <StreetName>17</StreetName> <StreetName>Business</StreetName> <PlaceName>Surfside</PlaceName> <PlaceName>Beach,</PlaceName> <StateName>South</StateName> <StateName>Carolina</StateName> <ZipCode>29575</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>3110</AddressNumber> <StreetNamePreDirectional>West</StreetNamePreDirectional> <StreetName>12th</StreetName> <StreetNamePostType>Street</StreetNamePostType> <PlaceName>Sioux</PlaceName> <PlaceName>Falls,</PlaceName> <StateName>South</StateName> <StateName>Dakota</StateName> <ZipCode>57104</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>42</AddressNumber> <StreetName>Water</StreetName> <StreetNamePostType>Street</StreetNamePostType> <PlaceName>New</PlaceName> <PlaceName>Shoreham,</PlaceName> <StateName>Rhode</StateName> <StateName>Island</StateName> <ZipCode>02807</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>291</AddressNumber> <StreetName>Dairy</StreetName> <StreetName>Barn</StreetName> <StreetNamePostType>Lane</StreetNamePostType> <PlaceName>Fort</PlaceName> <PlaceName>Mill,</PlaceName> <StateName>South</StateName> <StateName>Carolina</StateName> <ZipCode>29715</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
</AddressCollection>