HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavaMinor

Regex to find addresses and phone numbers

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
numbersaddressesphonefindandregex

Problem

I am trying to optimize my Java code where I am parsing an address field.

Address fields have the format:

full_address;phone; full_address;phone; full_address;phone;


where full_address = addresstype^street^city^state^zip

and where street = street1;street2;street3;street4;

So my string is

final String string = "Billing^Tata;3001 Garden Parkway^^NJ^;100-00-0009;Home^Goggle;3341 Main Parkway^^NY^;;";


My object location stores each of the above attributes.

//regular expression to match the address type
Pattern newPattern = Pattern.compile("(([^\\^]*)\\^([^\\^]*)\\^([^\\^]*)\\^([^\\^]*)\\^([^;]*);([^;]*);)");
Matcher newMatcher = newPattern.matcher(addressLongText);
List discreteListOfLocations = new ArrayList();
MatchResult result = null;
while (newMatcher.find())
{
    result = newMatcher.toMatchResult();
    Location location = new Location();
    location.setAddressTypeCdValue(result.group(2));
    String[] str_arr = result.group(3).split(";");
    if (str_arr.length > 0) 
    {
        location.setStreetAddress1(str_arr[0]);
    }
    if (str_arr.length > 1) 
    {
        location.setStreetAddress2(str_arr[1]);
    }
    if (str_arr.length > 2) 
    {
        location.setStreetAddress3(str_arr[2]);
    }
    if (str_arr.length > 3) 
    {
        location.setStreetAddress4(str_arr[3]);
    }
    location.setCity(result.group(4));
    location.setState(result.group(5));
    location.setZip(result.group(6));
    discreteListOfLocations.add(location);
}


I am a bit confused how to optimize the regex so that it is easier for someone else to understand what my regex is doing. Any idea or suggestion will be helpful.

Solution

Not sure about Java string catenation.

Below is your regex formatted and commented (by RegexFormat 5)

This puts it in expanded mode. The good thing is anybody can read it in

your source code for later reference.

Below is 2 versions. One a c++ normal catenation where newline \n are

added. Two a single quoted string where the newline is natural.

The nice thing about doing this in your code is you can always print it out

for debug purposes. It prints as a nice format.

"(?x)                                                               \n"
"   ( [^\\^]* )            # (1), Address type                      \n"
"   \\^                                                             \n"
"   ( [^\\^]* )            # (2), street1;street2;street3;street4;  \n"
"   \\^                                                             \n"
"   ( [^\\^]* )            # (3), City                              \n"
"   \\^                                                             \n"
"   ( [^\\^]* )            # (4), State                             \n"
"   \\^                                                             \n"
"   ( [^;]* )             # (5), Zip                                \n"
"   ;                                                               \n"
"   ( [^;]* )             # (6), Phone                              \n"
"   ;                                                               \n"


======================================

"(?x)
   ( [^\\^]* )            # (1), Address type
   \\^
   ( [^\\^]* )            # (2), street1;street2;street3;street4;
   \\^
   ( [^\\^]* )            # (3), City
   \\^
   ( [^\\^]* )            # (4), State
   \\^
   ( [^;]* )             # (5), Zip
   ;
   ( [^;]* )             # (6), Phone
   ;
"

Code Snippets

"(?x)                                                               \n"
"   ( [^\\^]* )            # (1), Address type                      \n"
"   \\^                                                             \n"
"   ( [^\\^]* )            # (2), street1;street2;street3;street4;  \n"
"   \\^                                                             \n"
"   ( [^\\^]* )            # (3), City                              \n"
"   \\^                                                             \n"
"   ( [^\\^]* )            # (4), State                             \n"
"   \\^                                                             \n"
"   ( [^;]* )             # (5), Zip                                \n"
"   ;                                                               \n"
"   ( [^;]* )             # (6), Phone                              \n"
"   ;                                                               \n"
"(?x)
   ( [^\\^]* )            # (1), Address type
   \\^
   ( [^\\^]* )            # (2), street1;street2;street3;street4;
   \\^
   ( [^\\^]* )            # (3), City
   \\^
   ( [^\\^]* )            # (4), State
   \\^
   ( [^;]* )             # (5), Zip
   ;
   ( [^;]* )             # (6), Phone
   ;
"

Context

StackExchange Code Review Q#69169, answer score: 2

Revisions (0)

No revisions yet.