snippetsqlModerate
How do I parse addresses in PostgreSQL?
Viewed 0 times
postgresqladdressesparsehow
Problem
Let's say for instance that I want to parse these addresses for the Chicken Ranch
In both of these cases, I'd like to get rid of
Chicken Ranch
10511 Homestead Rd
Pahrump, NV 89061
Chicken Ranch
1600 Pennsylvania Avenue
NW Washington, D.C. 20500In both of these cases, I'd like to get rid of
Rd and Avenue. For instance in the first case, I'd like to get "Homestead", and in the second "Pennsylvania". Not every address has a designation like this though.Solution
This is a question address canonicalization and parsing. Essentially what you're talking about is handled through a gazetteer (geographical rule set). There are two ways to do this right,
I'll show the address standardizer version for the address,
Using
And, then we can use it like this.
So you can see,
Likewise,...
address_standardizerfrom the PostGIS project and certainly better if you're only using United States addresses.
pgsql-postalmay be a better method for international addresses.
I'll show the address standardizer version for the address,
Chicken Ranch
10511 Homestead Rd
Pahrump, NV 89061Using
standardize_address from address_standardizer, returns a composite type of stdaddr. First we install it,CREATE EXTENSION address_standardizer;
CREATE EXTENSION address_standardizer_data_us;And, then we can use it like this.
SELECT * FROM standardize_address('us_lex',
'us_gaz', 'us_rules', '10511 Homestead Rd, Pahrump, NV 89061');
building | house_num | predir | qual | pretype | name | suftype | sufdir | ruralroute | extra | city | state | country | postcode | box | unit
----------+-----------+--------+------+---------+-----------+---------+--------+------------+-------+---------+--------+---------+----------+-----+------
| 10511 | | | | HOMESTEAD | ROAD | | | | PAHRUMP | NEVADA | USA | 89061 | |
(1 row)So you can see,
ROAD is pulled out in suftypeLikewise,...
SELECT * FROM standardize_address('us_lex',
'us_gaz', 'us_rules', '1600 Pennsylvania Avenue, NW Washington, D.C. 20500');
building | house_num | predir | qual | pretype | name | suftype | sufdir | ruralroute | extra | city | state | country | postcode | box | unit
----------+-----------+--------+------+---------+--------------+---------+-----------+------------+-------+------+-------+---------+-----------+-----+--------------
| 1600 | | | | PENNSYLVANIA | AVENUE | NORTHWEST | | | | | USA | D C 20500 | | # WASHINGTONCode Snippets
Chicken Ranch
10511 Homestead Rd
Pahrump, NV 89061CREATE EXTENSION address_standardizer;
CREATE EXTENSION address_standardizer_data_us;SELECT * FROM standardize_address('us_lex',
'us_gaz', 'us_rules', '10511 Homestead Rd, Pahrump, NV 89061');
building | house_num | predir | qual | pretype | name | suftype | sufdir | ruralroute | extra | city | state | country | postcode | box | unit
----------+-----------+--------+------+---------+-----------+---------+--------+------------+-------+---------+--------+---------+----------+-----+------
| 10511 | | | | HOMESTEAD | ROAD | | | | PAHRUMP | NEVADA | USA | 89061 | |
(1 row)SELECT * FROM standardize_address('us_lex',
'us_gaz', 'us_rules', '1600 Pennsylvania Avenue, NW Washington, D.C. 20500');
building | house_num | predir | qual | pretype | name | suftype | sufdir | ruralroute | extra | city | state | country | postcode | box | unit
----------+-----------+--------+------+---------+--------------+---------+-----------+------------+-------+------+-------+---------+-----------+-----+--------------
| 1600 | | | | PENNSYLVANIA | AVENUE | NORTHWEST | | | | | USA | D C 20500 | | # WASHINGTONContext
StackExchange Database Administrators Q#165915, answer score: 10
Revisions (0)
No revisions yet.