patternpythonMinor
HTTP URL validating
Viewed 0 times
httpurlvalidating
Problem
What do you think about this?
Is this enough? I'm going to insert URLs into database. Are there other ways to validate it?
#utils.py
def is_http_url(s):
"""
Returns true if s is valid http url, else false
Arguments:
- `s`:
"""
if re.match('https?://(?:www)?(?:[\w-]{2,255}(?:\.\w{2,6}){1,2})(?:/[\w&%?#-]{1,300})?',s):
return True
else:
return False
#utils_test.py
import utils
class TestHttpUrlValidating(unittest.TestCase):
"""
"""
def test_validating(self):
"""
"""
self.assertEqual(utils.is_http_url('https://google.com'),True)
self.assertEqual(utils.is_http_url('http://www.google.com/r-o_ute?key=value'),True)
self.assertEqual(utils.is_http_url('aaaaaa'),False)Is this enough? I'm going to insert URLs into database. Are there other ways to validate it?
Solution
I would check out Django's validator - I'm willing to bet it's going to be decent, and it's going to be very well tested.
This covers a few edge cases like IP addresses and ports. Obviously some stuff (like FTP links) you might not want to accept, but it'd be a good place to start.
regex = re.compile(
r'^(?:http|ftp)s?://' # http:// or https://
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain...
r'localhost|' # localhost...
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|' # ...or ipv4
r'\[?[A-F0-9]*:[A-F0-9:]+\]?)' # ...or ipv6
r'(?::\d+)?' # optional port
r'(?:/?|[/?]\S+)
This covers a few edge cases like IP addresses and ports. Obviously some stuff (like FTP links) you might not want to accept, but it'd be a good place to start., re.IGNORECASE)This covers a few edge cases like IP addresses and ports. Obviously some stuff (like FTP links) you might not want to accept, but it'd be a good place to start.
Code Snippets
regex = re.compile(
r'^(?:http|ftp)s?://' # http:// or https://
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain...
r'localhost|' # localhost...
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|' # ...or ipv4
r'\[?[A-F0-9]*:[A-F0-9:]+\]?)' # ...or ipv6
r'(?::\d+)?' # optional port
r'(?:/?|[/?]\S+)$', re.IGNORECASE)Context
StackExchange Code Review Q#19663, answer score: 2
Revisions (0)
No revisions yet.