HiveBrain v1.2.0
Get Started
← Back to all entries
patternjavascriptMinor

Regex for domain validation name validation

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
validationnameforregexdomain

Problem

I created this regex in Javascript that returns a boolean for Domain validation that meets the criteria to allow IP addresses and ascii domain name. The assumption is that the TLD should be atleast two letters.

/^((([0-9]{1,3}\.){3}[0-9]{1,3})|(([a-zA-Z0-9]+(([\-]?[a-zA-Z0-9]+)*\.)+)*[a-zA-Z]{2,}))$/


I used the following Javascript to test the regex:

var func = function(val) { return /^((([0-9]{1,3}\.){3}[0-9]{1,3})|(([a-zA-Z0-9]+(([\-]?[a-zA-Z0-9]+)*\.)+)*[a-zA-Z]{2,}))$/.test(val);}


It works correctly:

func('192.168.1.1') //return true; 
   func('a-a.com')     //returns true;
   func('aa.com')      //returns true;
   func('aa.cc')       //returns true;
   func('aa.c')        //returns false;


I have basic knowledge of regex's and hence seeing if there anyway to optimize it.

Solution

Your regex will suffer from catastrophic backtracking with certain inputs. Just try matching aaaaaaaaaaaaaaaaaaaaaaaaaaaa, and you'll see the visible slowdown.

I suggest (as was already mentioned in the comments) that you break up the regex at the | into two separate regexes, one for IP addresses and the other for domain names. It just makes more sense, especially from a future maintainability perspective, and will make things easier for you in the future.

The IP address part is actually very efficient on its own, so there are no further changes I would make to this part:

^([0-9]{1,3}\.){3}[0-9]{1,3}$


The second part of the regex is the source of the slowdown. It also matches some strange things, like this:

a....a..................aa
aa
a.aa
a.-a.aa


I'm just going to assume that these are bugs, although I don't know given that you haven't made it clear what should match. That's why I will write up some rules for what I want to match:

  • Must start with an alphanumeric



  • There may be . or - characters, but they must be surrounded by alphanumerics.



  • Must end with . followed by two or more letters



Here is the regex that would be (with newlines added for emphasis):

^[a-zA-Z0-9]+
([-.][a-zA-Z0-9]+)*
\.[a-zA-Z]{2,}$


Note that - does not need to be escaped if it's the first thing in the character class. . is never a metacharacter in character classes.

Finally, I would use the i modifier to get rid of all the a-zA-Z redundancy:

/^[a-z0-9]+([-.][a-z0-9]+)*\.[a-z]{2,}$/i


Here are the tests:

a-a.com
1-1-1-1.com
1.1.1.com
aa.com
aa.cc
a.com
a.a.a.a.aa


a..a..................aa
aa
aaaaaaaaaaaaaa
a.-a.aa


aa.c
a--a.com
-aa.com
a-a-a-a-.com


The first group is matched by both, the second group is matched by only your regex, and the last group is matched by neither.

To show you the difference in performance, I tested against this variation of your original regex:

^([a-zA-Z0-9]+(([\-]?[a-zA-Z0-9]+)*\.)+)*[a-zA-Z]{2,}$


Using the PCRE option on Regex101 (with the gm options), your regex takes 66390 steps. The new regex I wrote took only 214 steps, and it's shorter and more legible too.

Code Snippets

^([0-9]{1,3}\.){3}[0-9]{1,3}$
a....a..................aa
aa
a.aa
a.-a.aa
^[a-zA-Z0-9]+
([-.][a-zA-Z0-9]+)*
\.[a-zA-Z]{2,}$
/^[a-z0-9]+([-.][a-z0-9]+)*\.[a-z]{2,}$/i
a-a.com
1-1-1-1.com
1.1.1.com
aa.com
aa.cc
a.com
a.a.a.a.aa

Context

StackExchange Code Review Q#140335, answer score: 6

Revisions (0)

No revisions yet.