patternModerate
Regular Expression matching customer number strings
Viewed 0 times
expressionnumberregularstringsmatchingcustomer
Problem
Using VB.NET, I have created an AddIn for Autodesk Inventor and the customer has a bunch of drawing number strings which follow this sort of scheme:
In order to account for these groups of three digits within the variations I have created the following regex:
I now have to add the capability of looking for a sixth group of digits so figured I would ask here if there is a method within regex (which I may have overlooked) that will allow me to improve upon/simplify
- P01867-13-TP09-001-4950-1775-1175-895-1125-835
- P01867-13-TP09-002-4950-1775-1045-895-1035
- P01867-13-TP02-019-L-1137-275-852-102
- P01867-13-TP02-019-L-1137-275-852-102
- P01867-13-TP02-019-R-1137-275-852-102
- P01867-13-TP02-021-L-1137-1055-1372
- P01867-13-TP02-021-L-1137-535-1027
- P01867-13-TP02-021-L-1137-795-1184
- P01867-13-TP02-021-R-1137-1055-1372
- P01867-13-TP02-021-R-1137-535-1027
- P01867-13-TP02-021-R-1137-795-1184
- P01867-13-TP02-025-L-1137-1315-1581
- P01867-13-TP02-025-R-1137-1315-1581
- P01867-13-TP03-005
- P01867-13-TP02-019-L-1137-275
- P01867-13-TP02-019-R-1137-275
- P01867-13-TP02-019-R-1137
- P01867-13-TP02-019-L-1137
In order to account for these groups of three digits within the variations I have created the following regex:
(\w*\d*-\d*-\w*\d*-\d*-\w-)(\d*)-(\d*)-(\d*)-(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-\w-)(\d*)-(\d*)-(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-\w-)(\d*)-(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-\w-)(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-\w-)(\d*)|(\w*\d*-\d*-\w*\d*-\d*-)(\d*)-(\d*)-(\d*)-(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-)(\d*)-(\d*)-(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-)(\d*)-(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-)(\d*)-(\d*)|(\w*\d*-\d*-\w*\d*-\d*-)(\d*)|(\w*\d*-\d*-\w*\d*-\d*)|.*(\w*\d*-\d*-)(\d*)-(\d*)-(\d*)-(\d*)-(\d*)|.*(\w*\d*-\d*-)(\d*)-(\d*)-(\d*)-(\d*)|.*(\w*\d*-\d*-)(\d*)-(\d*)-(\d*)|.*(\w*\d*-\d*-)(\d*)-(\d*)|.*(\w*\d*-\d*-)(\d*)|.*(\w*\d*-\d*-\w-)(\d*)-(\d*)-(\d*)-(\d*)-(\d*)|.*(\w*\d*-\d*-\w-)(\d*)-(\d*)-(\d*)-(\d*)|.*(\w*\d*-\d*-\w-)(\d*)-(\d*)-(\d*)|.*(\w*\d*-\d*-\w-)(\d*)-(\d*)|.*(\w*\d*-\d*-\w-)(\d*)I now have to add the capability of looking for a sixth group of digits so figured I would ask here if there is a method within regex (which I may have overlooked) that will allow me to improve upon/simplify
Solution
^P\d{5}-13-TP\d{2}-\d{3}(-(L|R|\d{4})(-\d{3,4})*)?$Your current regex is WAY too forgiving. First of all, every one of your example starts with a P followed by some numbers, but you accept ANY COMBINATION of letters at the beginning. I'm assuming that
ALEX01867-13-TP02-019-L-1137 isn't a valid key, so you should take steps to reject it by using hungry quantifiers as little as possible (*, +). Using \d{3,4} matches a digit between 3 and 4 times, so that willlet you limit the sort of input you accept.
The same goes for the 5th group - according to your examples, it's either
L, R, or 4 digits. In Regex, that looks like this: (L|R|\d{4})Next, you are using alternation (
option1|option2) to capture the different "forms" your string comes in as, but you are repeating a bunch of stuff (for example, the \w\d at the beginning). You can limit the scope of the alternation by surrounding it in brackets (()). You can see this in action with the (L|R|\d{4}) example - that whole bracket group becomes a single token that matches somewhere in a string (or doesn't).Sometimes the string ends after the 4th group (Before the L/R group), and sometimes it doesn't. Instead of using alternation to solve this, which makes the regex VERY long, you can just surround the entire regex AFTER that point in brackets with a question mark (
an-(example)?). This makes the entire second part optional.Finally, your problem asks if there is a simple method to improve the regex. By ending it in
(-\d{3,4}) you can match ANY length of additions to the end, assuming the all come in the form -015 or -1992 or whatever. If you knew that there was always a max of 15 numbers added to the end, you could change that star () to a max quantifier ({,15}). If sometimes the number only has two digits, change the {3,4} to {2,4}, etc.See it in action here
Context
StackExchange Code Review Q#107774, answer score: 10
Revisions (0)
No revisions yet.