HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Regular Expression matching customer number strings

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
expressionnumberregularstringsmatchingcustomer

Problem

Using VB.NET, I have created an AddIn for Autodesk Inventor and the customer has a bunch of drawing number strings which follow this sort of scheme:



  • P01867-13-TP09-001-4950-1775-1175-895-1125-835



  • P01867-13-TP09-002-4950-1775-1045-895-1035



  • P01867-13-TP02-019-L-1137-275-852-102



  • P01867-13-TP02-019-L-1137-275-852-102



  • P01867-13-TP02-019-R-1137-275-852-102



  • P01867-13-TP02-021-L-1137-1055-1372



  • P01867-13-TP02-021-L-1137-535-1027



  • P01867-13-TP02-021-L-1137-795-1184



  • P01867-13-TP02-021-R-1137-1055-1372



  • P01867-13-TP02-021-R-1137-535-1027



  • P01867-13-TP02-021-R-1137-795-1184



  • P01867-13-TP02-025-L-1137-1315-1581



  • P01867-13-TP02-025-R-1137-1315-1581



  • P01867-13-TP03-005



  • P01867-13-TP02-019-L-1137-275



  • P01867-13-TP02-019-R-1137-275



  • P01867-13-TP02-019-R-1137



  • P01867-13-TP02-019-L-1137




In order to account for these groups of three digits within the variations I have created the following regex:

(\w*\d*-\d*-\w*\d*-\d*-\w-)(\d*)-(\d*)-(\d*)-(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-\w-)(\d*)-(\d*)-(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-\w-)(\d*)-(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-\w-)(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-\w-)(\d*)|(\w*\d*-\d*-\w*\d*-\d*-)(\d*)-(\d*)-(\d*)-(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-)(\d*)-(\d*)-(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-)(\d*)-(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-)(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-)(\d*)|(\w*\d*-\d*-\w*\d*-\d*)|.*(\w*\d*-\d*-)(\d*)-(\d*)-(\d*)-(\d*)-(\d*)|.*(\w*\d*-\d*-)(\d*)-(\d*)-(\d*)-(\d*)|.*(\w*\d*-\d*-)(\d*)-(\d*)-(\d*)|.*(\w*\d*-\d*-)(\d*)-(\d*)|.*(\w*\d*-\d*-)(\d*)|.*(\w*\d*-\d*-\w-)(\d*)-(\d*)-(\d*)-(\d*)-(\d*)|.*(\w*\d*-\d*-\w-)(\d*)-(\d*)-(\d*)-(\d*)|.*(\w*\d*-\d*-\w-)(\d*)-(\d*)-(\d*)|.*(\w*\d*-\d*-\w-)(\d*)-(\d*)|.*(\w*\d*-\d*-\w-)(\d*)


I now have to add the capability of looking for a sixth group of digits so figured I would ask here if there is a method within regex (which I may have overlooked) that will allow me to improve upon/simplify

Solution

^P\d{5}-13-TP\d{2}-\d{3}(-(L|R|\d{4})(-\d{3,4})*)?$

Your current regex is WAY too forgiving. First of all, every one of your example starts with a P followed by some numbers, but you accept ANY COMBINATION of letters at the beginning. I'm assuming that ALEX01867-13-TP02-019-L-1137 isn't a valid key, so you should take steps to reject it by using hungry quantifiers as little as possible (*, +). Using \d{3,4} matches a digit between 3 and 4 times, so that will
let you limit the sort of input you accept.

The same goes for the 5th group - according to your examples, it's either L, R, or 4 digits. In Regex, that looks like this: (L|R|\d{4})

Next, you are using alternation (option1|option2) to capture the different "forms" your string comes in as, but you are repeating a bunch of stuff (for example, the \w\d at the beginning). You can limit the scope of the alternation by surrounding it in brackets (()). You can see this in action with the (L|R|\d{4}) example - that whole bracket group becomes a single token that matches somewhere in a string (or doesn't).

Sometimes the string ends after the 4th group (Before the L/R group), and sometimes it doesn't. Instead of using alternation to solve this, which makes the regex VERY long, you can just surround the entire regex AFTER that point in brackets with a question mark (an-(example)?). This makes the entire second part optional.

Finally, your problem asks if there is a simple method to improve the regex. By ending it in (-\d{3,4}) you can match ANY length of additions to the end, assuming the all come in the form -015 or -1992 or whatever. If you knew that there was always a max of 15 numbers added to the end, you could change that star () to a max quantifier ({,15}). If sometimes the number only has two digits, change the {3,4} to {2,4}, etc.

See it in action here

Context

StackExchange Code Review Q#107774, answer score: 10

Revisions (0)

No revisions yet.