HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

HTTP URL validating

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
httpurlvalidating

Problem

What do you think about this?

#utils.py
def is_http_url(s):
    """
    Returns true if s is valid http url, else false 
    Arguments:
    - `s`:
    """
    if re.match('https?://(?:www)?(?:[\w-]{2,255}(?:\.\w{2,6}){1,2})(?:/[\w&%?#-]{1,300})?',s):
        return True
    else:
        return False

#utils_test.py
import utils
class TestHttpUrlValidating(unittest.TestCase):
"""
 """
    def test_validating(self):
        """
        """
        self.assertEqual(utils.is_http_url('https://google.com'),True)
        self.assertEqual(utils.is_http_url('http://www.google.com/r-o_ute?key=value'),True)
        self.assertEqual(utils.is_http_url('aaaaaa'),False)


Is this enough? I'm going to insert URLs into database. Are there other ways to validate it?

Solution

I would check out Django's validator - I'm willing to bet it's going to be decent, and it's going to be very well tested.

regex = re.compile(
    r'^(?:http|ftp)s?://' # http:// or https://
    r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain...
    r'localhost|' # localhost...
    r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|' # ...or ipv4
    r'\[?[A-F0-9]*:[A-F0-9:]+\]?)' # ...or ipv6
    r'(?::\d+)?' # optional port
    r'(?:/?|[/?]\S+)

This covers a few edge cases like IP addresses and ports. Obviously some stuff (like FTP links) you might not want to accept, but it'd be a good place to start., re.IGNORECASE)


This covers a few edge cases like IP addresses and ports. Obviously some stuff (like FTP links) you might not want to accept, but it'd be a good place to start.

Code Snippets

regex = re.compile(
    r'^(?:http|ftp)s?://' # http:// or https://
    r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain...
    r'localhost|' # localhost...
    r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|' # ...or ipv4
    r'\[?[A-F0-9]*:[A-F0-9:]+\]?)' # ...or ipv6
    r'(?::\d+)?' # optional port
    r'(?:/?|[/?]\S+)$', re.IGNORECASE)

Context

StackExchange Code Review Q#19663, answer score: 2

Revisions (0)

No revisions yet.