HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlModerate

How do I parse addresses in PostgreSQL?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqladdressesparsehow

Problem

Let's say for instance that I want to parse these addresses for the Chicken Ranch

Chicken Ranch
10511 Homestead Rd
Pahrump, NV 89061

Chicken Ranch
1600 Pennsylvania Avenue
NW Washington, D.C. 20500


In both of these cases, I'd like to get rid of Rd and Avenue. For instance in the first case, I'd like to get "Homestead", and in the second "Pennsylvania". Not every address has a designation like this though.

Solution

This is a question address canonicalization and parsing. Essentially what you're talking about is handled through a gazetteer (geographical rule set). There are two ways to do this right,

  • address_standardizer from the PostGIS project and certainly better if you're only using United States addresses.



  • pgsql-postal may be a better method for international addresses.



I'll show the address standardizer version for the address,

Chicken Ranch
10511 Homestead Rd
Pahrump, NV 89061


Using standardize_address from address_standardizer, returns a composite type of stdaddr. First we install it,

CREATE EXTENSION address_standardizer;
CREATE EXTENSION address_standardizer_data_us;


And, then we can use it like this.

SELECT * FROM standardize_address('us_lex',
   'us_gaz', 'us_rules', '10511 Homestead Rd, Pahrump, NV 89061');
 building | house_num | predir | qual | pretype |   name    | suftype | sufdir | ruralroute | extra |  city   | state  | country | postcode | box | unit 
----------+-----------+--------+------+---------+-----------+---------+--------+------------+-------+---------+--------+---------+----------+-----+------
          | 10511     |        |      |         | HOMESTEAD | ROAD    |        |            |       | PAHRUMP | NEVADA | USA     | 89061    |     | 
(1 row)


So you can see, ROAD is pulled out in suftype

Likewise,...

SELECT * FROM standardize_address('us_lex',
   'us_gaz', 'us_rules', '1600 Pennsylvania Avenue, NW Washington, D.C. 20500');
 building | house_num | predir | qual | pretype |     name     | suftype |  sufdir   | ruralroute | extra | city | state | country | postcode  | box |     unit     
----------+-----------+--------+------+---------+--------------+---------+-----------+------------+-------+------+-------+---------+-----------+-----+--------------
          | 1600      |        |      |         | PENNSYLVANIA | AVENUE  | NORTHWEST |            |       |      |       | USA     | D C 20500 |     | # WASHINGTON

Code Snippets

Chicken Ranch
10511 Homestead Rd
Pahrump, NV 89061
CREATE EXTENSION address_standardizer;
CREATE EXTENSION address_standardizer_data_us;
SELECT * FROM standardize_address('us_lex',
   'us_gaz', 'us_rules', '10511 Homestead Rd, Pahrump, NV 89061');
 building | house_num | predir | qual | pretype |   name    | suftype | sufdir | ruralroute | extra |  city   | state  | country | postcode | box | unit 
----------+-----------+--------+------+---------+-----------+---------+--------+------------+-------+---------+--------+---------+----------+-----+------
          | 10511     |        |      |         | HOMESTEAD | ROAD    |        |            |       | PAHRUMP | NEVADA | USA     | 89061    |     | 
(1 row)
SELECT * FROM standardize_address('us_lex',
   'us_gaz', 'us_rules', '1600 Pennsylvania Avenue, NW Washington, D.C. 20500');
 building | house_num | predir | qual | pretype |     name     | suftype |  sufdir   | ruralroute | extra | city | state | country | postcode  | box |     unit     
----------+-----------+--------+------+---------+--------------+---------+-----------+------------+-------+------+-------+---------+-----------+-----+--------------
          | 1600      |        |      |         | PENNSYLVANIA | AVENUE  | NORTHWEST |            |       |      |       | USA     | D C 20500 |     | # WASHINGTON

Context

StackExchange Database Administrators Q#165915, answer score: 10

Revisions (0)

No revisions yet.