Skip to content

Commit 753a4fb

Browse files
authored
Merge pull request #390 from datamade/patch/hc-box
Add data for a handful of different cases
2 parents e7a9055 + a9faa2f commit 753a4fb

6 files changed

Lines changed: 112 additions & 4 deletions

File tree

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
## Overview
2+
3+
Brief description of what this PR does, and why it is needed.
4+
5+
If this pr closes an issue, make note of it here 👇
6+
Closes #XXX
7+
8+
### Demo
9+
10+
Optional. Screenshots, `curl` examples, etc.
11+
12+
### Notes
13+
14+
Optional. Ancillary topics, caveats, alternative strategies that didn't work out, anything else.
15+
16+
## Testing Instructions
17+
18+
* How to test this PR
19+
* Prefer bulleted description
20+
* Start after checking out this branch
21+
* Include any setup required, such as bundling scripts, restarting services, etc.
22+
* Include test case, and expected output

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Having trouble building the code? [Open an issue](https://github.com/datamade/us
6868

6969
### Adding new training data
7070

71-
If usaddress is consistently failing on particular address patterns, you can adjust the parser's behavior by adding new training data to the model. [Follow our guide in the training directory](https://github.com/datamade/usaddress/blob/master/training/README.md), and be sure to make a pull request so that we can incorporate your contribution into our next release!
71+
If usaddress is consistently failing on particular address patterns, you can adjust the parser's behavior by adding new training data to the model. [Follow our guide in the training directory](./training/README.md), and be sure to make a pull request so that we can incorporate your contribution into our next release!
7272

7373
## Important links
7474

@@ -91,7 +91,7 @@ If usaddress is consistently failing on particular address patterns, you can adj
9191

9292
Report issues in the [issue tracker](https://github.com/datamade/usaddress/issues)
9393

94-
If an address was parsed incorrectly, please let us know! You can either [open an issue](https://github.com/datamade/usaddress/issues/new) or (if you're adventurous) [add new training data to improve the parser's model.](https://github.com/datamade/usaddress/blob/master/training/README.md) When possible, please send over a few real-world examples of similar address patterns, along with some info about the source of the data - this will help us train the parser and improve its performance.
94+
If an address was parsed incorrectly, please let us know! You can either [open an issue](https://github.com/datamade/usaddress/issues/new) or (if you're adventurous) [add new training data to improve the parser's model.](./training/README.md) When possible, please send over a few real-world examples of similar address patterns, along with some info about the source of the data - this will help us train the parser and improve its performance.
9595

9696
If something in the library is not behaving intuitively, it is a bug, and should be reported.
9797

@@ -103,4 +103,4 @@ If something in the library is not behaving intuitively, it is a bug, and should
103103

104104
## Copyright
105105

106-
Copyright (c) 2025 Atlanta Journal Constitution. Released under the [MIT License](https://github.com/datamade/usaddress/blob/master/LICENSE).
106+
Copyright (c) 2025 Atlanta Journal Constitution. Released under the [MIT License](./LICENSE).

measure_performance/test_data/labeled.xml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,4 +126,34 @@
126126
<AddressString><AddressNumber>150</AddressNumber> <StreetName>Citizens</StreetName> <StreetNamePostType>Circle</StreetNamePostType> <PlaceName>Little</PlaceName> <PlaceName>River,</PlaceName> <StateName>South</StateName> <StateName>Carolina</StateName> <ZipCode>29566</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
127127
<AddressString><AddressNumber>4079</AddressNumber> <StreetNamePreType>U.S.</StreetNamePreType> <StreetName>17</StreetName> <StreetName>Business</StreetName> <PlaceName>Murrells</PlaceName> <PlaceName>Inlet,</PlaceName> <StateName>South</StateName> <StateName>Carolina</StateName> <ZipCode>29576</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
128128
<AddressString><AddressNumber>43</AddressNumber> <StreetNamePreDirectional>South</StreetNamePreDirectional> <StreetName>Broadway</StreetName> <PlaceName>Pitman,</PlaceName> <StateName>New</StateName> <StateName>Jersey</StateName> <ZipCode>08071</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
129+
<AddressString><USPSBoxGroupType>HC</USPSBoxGroupType> <USPSBoxGroupID>2333</USPSBoxGroupID> <USPSBoxType>Box</USPSBoxType> <USPSBoxID>85</USPSBoxID></AddressString>
130+
<AddressString><USPSBoxGroupType>HC</USPSBoxGroupType> <USPSBoxGroupID>284</USPSBoxGroupID> <USPSBoxType>Box</USPSBoxType> <USPSBoxID>27</USPSBoxID></AddressString>
131+
<AddressString><USPSBoxGroupType>HC</USPSBoxGroupType> <USPSBoxGroupID>7326</USPSBoxGroupID> <USPSBoxType>Box</USPSBoxType> <USPSBoxID>66</USPSBoxID></AddressString>
132+
<AddressString><USPSBoxGroupType>HC</USPSBoxGroupType> <USPSBoxGroupID>992</USPSBoxGroupID> <USPSBoxType>Box</USPSBoxType> <USPSBoxID>88</USPSBoxID></AddressString>
133+
<AddressString><USPSBoxGroupType>HC</USPSBoxGroupType> <USPSBoxGroupType>R</USPSBoxGroupType> <USPSBoxGroupID>32</USPSBoxGroupID> <USPSBoxType>Box</USPSBoxType> <USPSBoxID>#</USPSBoxID> <USPSBoxID>e3</USPSBoxID></AddressString>
134+
<AddressString><USPSBoxGroupType>HC</USPSBoxGroupType> <USPSBoxGroupType>ROUTE</USPSBoxGroupType> <USPSBoxGroupID>72</USPSBoxGroupID> <USPSBoxType>BOX</USPSBoxType> <USPSBoxID>1A</USPSBoxID></AddressString>
135+
<AddressString><USPSBoxGroupType>HIGHWAY</USPSBoxGroupType> <USPSBoxGroupType>CONTRACT</USPSBoxGroupType> <USPSBoxGroupType>rte</USPSBoxGroupType> <USPSBoxGroupID>#</USPSBoxGroupID> <USPSBoxGroupID>46</USPSBoxGroupID> <USPSBoxType>BOX</USPSBoxType> <USPSBoxID>#</USPSBoxID> <USPSBoxID>992</USPSBoxID></AddressString>
136+
<AddressString><USPSBoxGroupType>HIGHWAY</USPSBoxGroupType> <USPSBoxGroupType>CONtraCT</USPSBoxGroupType> <USPSBoxGroupType>ROUTE</USPSBoxGroupType> <USPSBoxGroupID>56</USPSBoxGroupID> <USPSBoxType>BOX</USPSBoxType> <USPSBoxID>45C</USPSBoxID></AddressString>
137+
<AddressString><USPSBoxGroupType>StaR</USPSBoxGroupType> <USPSBoxGroupType>ROUTE</USPSBoxGroupType> <USPSBoxGroupID>75</USPSBoxGroupID> <USPSBoxType>BOX</USPSBoxType> <USPSBoxID>5Z</USPSBoxID></AddressString>
138+
<AddressString><USPSBoxGroupType>HCR</USPSBoxGroupType> <USPSBoxGroupID>4e</USPSBoxGroupID> <USPSBoxType>box</USPSBoxType> <USPSBoxID>#</USPSBoxID> <USPSBoxID>32</USPSBoxID></AddressString>
139+
<AddressString><USPSBoxGroupType>HCR</USPSBoxGroupType> <USPSBoxGroupID>88</USPSBoxGroupID> <USPSBoxType>bOX</USPSBoxType> <USPSBoxID>76E</USPSBoxID></AddressString>
140+
<AddressString><USPSBoxGroupType>HWY</USPSBoxGroupType> <USPSBoxGroupType>CONTRACT</USPSBoxGroupType> <USPSBoxGroupType>ROUTE</USPSBoxGroupType> <USPSBoxGroupID>102</USPSBoxGroupID> <USPSBoxType>BOX</USPSBoxType> <USPSBoxID>255A</USPSBoxID></AddressString>
141+
<AddressString><AddressNumber>4510</AddressNumber> <StreetNamePreType>COUNTY</StreetNamePreType> <StreetNamePreType>ROAD</StreetNamePreType> <StreetName>GV,</StreetName> <PlaceName>APPLETON,</PlaceName> <StateName>WI</StateName> <ZipCode>54913</ZipCode></AddressString>
142+
<AddressString><AddressNumber>7575</AddressNumber> <StreetNamePreType>COUNTY</StreetNamePreType> <StreetNamePreType>ROAD</StreetNamePreType> <StreetName>ZZZ,</StreetName> <PlaceName>MILWAUKEE,</PlaceName> <StateName>WI</StateName> <ZipCode>54567</ZipCode></AddressString>
143+
<AddressString><AddressNumber>123A</AddressNumber> <StreetNamePreDirectional>E</StreetNamePreDirectional> <StreetNamePreType>COUNTY</StreetNamePreType> <StreetNamePreType>ROAD</StreetNamePreType> <StreetName>DV,</StreetName> <PlaceName>WAUPACA,</PlaceName> <StateName>WI</StateName> <ZipCode>54981</ZipCode></AddressString>
144+
<AddressString><AddressNumber>1331</AddressNumber> <StreetNamePreType>COUNTY</StreetNamePreType> <StreetNamePreType>ROAD</StreetNamePreType> <StreetName>AA</StreetName> <StreetNamePostDirectional>NE,</StreetNamePostDirectional> <PlaceName>AMHERST</PlaceName> <PlaceName>JUNCTION,</PlaceName> <StateName>WI</StateName> <ZipCode>54407</ZipCode></AddressString>
145+
<AddressString><AddressNumber>133</AddressNumber> <StreetNamePreDirectional>W</StreetNamePreDirectional> <StreetNamePreType>COUNTY</StreetNamePreType> <StreetNamePreType>ROAD</StreetNamePreType> <StreetName>LL,</StreetName> <PlaceName>AMHERST,</PlaceName> <StateName>WI</StateName> <ZipCode>54406</ZipCode></AddressString>
146+
<AddressString><AddressNumber>123</AddressNumber> <StreetNamePreType>COUNTY</StreetNamePreType> <StreetNamePreType>ROAD</StreetNamePreType> <StreetName>ABC,</StreetName> <OccupancyType>APT</OccupancyType> <OccupancyIdentifier>12,</OccupancyIdentifier> <PlaceName>IOLA,</PlaceName> <StateName>WI</StateName> <ZipCode>54445</ZipCode></AddressString>
147+
<AddressString><AddressNumber>200</AddressNumber> <StreetNamePreDirectional>EAST</StreetNamePreDirectional> <StreetName>ELM,</StreetName> <PlaceName>DENVER,</PlaceName> <StateName>COLORADO</StateName></AddressString>
148+
<AddressString><AddressNumber>55</AddressNumber> <StreetName>WINDSOR</StreetName> <StreetNamePostType>PLACE,</StreetNamePostType> <PlaceName>CHAMPAIGN,</PlaceName> <StateName>ILLINOIS</StateName></AddressString>
149+
<AddressString><AddressNumber>5</AddressNumber> <StreetNamePreDirectional>NORTH</StreetNamePreDirectional> <StreetName>MAIN,</StreetName> <PlaceName>VAN</PlaceName> <PlaceName>NUYS,</PlaceName> <StateName>CALIFORNIA</StateName></AddressString>
150+
<AddressString><AddressNumber>2609</AddressNumber> <StreetName>BAYVIEW,</StreetName> <PlaceName>FORT</PlaceName> <PlaceName>LAUDERDALE,</PlaceName> <StateName>FL</StateName></AddressString>
151+
<AddressString><AddressNumber>12855</AddressNumber> <StreetName>6TH</StreetName> <StreetNamePostType>AVE,</StreetNamePostType> <PlaceName>N.</PlaceName> <PlaceName>MIAMI,</PlaceName> <StateName>FL</StateName> <ZipCode>33161</ZipCode></AddressString>
152+
<AddressString><AddressNumber>783</AddressNumber> <StreetName>HOPE</StreetName> <StreetNamePostType>ST,</StreetNamePostType> <PlaceName>PROVIDENCE,</PlaceName> <StateName>RHODE</StateName> <StateName>ISLAND</StateName> <ZipCode>02906</ZipCode></AddressString>
153+
<AddressString><AddressNumber>200</AddressNumber> <StreetNamePreDirectional>EAST</StreetNamePreDirectional> <StreetName>ELM,</StreetName> <PlaceName>DENVER,</PlaceName> <StateName>COLORADO</StateName></AddressString>
154+
<AddressString><AddressNumber>977</AddressNumber> <StreetName>PLEASANT</StreetName> <StreetNamePostType>STREET,</StreetNamePostType> <PlaceName>N.</PlaceName> <PlaceName>ORANGE,</PlaceName> <StateName>NJ</StateName> <ZipCode>07052</ZipCode></AddressString>
155+
<AddressString><AddressNumber>610</AddressNumber> <StreetNamePreDirectional>EAST</StreetNamePreDirectional> <StreetName>MAIN</StreetName> <PlaceName>MARION</PlaceName> <StateName>KANSAS</StateName></AddressString>
156+
<AddressString><AddressNumber>10</AddressNumber> <StreetNamePreDirectional>EAST</StreetNamePreDirectional> <StreetName>LAKE,</StreetName> <PlaceName>DENVER,</PlaceName> <StateName>COLORADO</StateName></AddressString>
157+
<AddressString><AddressNumber>2735</AddressNumber> <StreetName>PAWTUCKET</StreetName> <StreetNamePostType>AVE</StreetNamePostType> <PlaceName>EAST</PlaceName> <PlaceName>PROVIDENCE</PlaceName> <StateName>RHODE</StateName> <StateName>ISLAND</StateName> <ZipCode>02914</ZipCode></AddressString>
158+
<AddressString><AddressNumber>5548</AddressNumber> <StreetName>ELMER</StreetName> <StreetNamePostType>AVENUE,</StreetNamePostType> <PlaceName>N.</PlaceName> <PlaceName>HOLLYWOOD,</PlaceName> <StateName>CA</StateName> <ZipCode>91601</ZipCode></AddressString>
129159
</AddressCollection>

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "usaddress"
3-
version = "0.5.13"
3+
version = "0.5.14"
44
description = "Parse US addresses using conditional random fields"
55
readme = "README.md"
66
license = {text = "MIT License", url = "http://www.opensource.org/licenses/mit-license.php"}

training/README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -280,6 +280,22 @@ Congratulations! The model has officially improved. You can safely move on to st
280280

281281
If any of our tests failed, however, things become more complicated. The output will break down the tests that failed, showing you the parse that the model produced (labeled `pred`) and the parse that the test expected (labeled `true`). In this case, jump to step 5a to debug your errors.
282282

283+
If you'd like to additionally spot check singular addresses in the python shell, install a virtual environment, activate it, install your WIP version of this package, and open a shell.
284+
```bash
285+
python3 -m venv .venv
286+
source .venv/bin/activate
287+
pip install -e ".[dev]" -v
288+
python
289+
# shell starts up
290+
>>>
291+
```
292+
293+
Then import usaddress and start parsing!
294+
```python
295+
>>> import usaddress
296+
>>> usaddress.parse("a funky address")
297+
```
298+
283299
**5a. Repeat steps 1-4 until the tests pass.**
284300

285301
If you've arrived at this step, it means that some of your tests failed. Uh oh!

0 commit comments

Comments
 (0)