patternpythonMinor
Finding e-commerce site of the product URL given and see if it is supported
Viewed 0 times
supportedtheproductseesitecommercefindingandgivenurl
Problem
I am building a Python app in which user can maintain wishlist of the products. I only support few e-commerce sites and do not support country specific sites (e.g I may support amazon.com but not amazon.in). I do not support mobile version of the URL (e.g. http://m.amazon.com) and also I am not interested in the query string part of the URL (and I don't want it also).
Following is the code and also the test cases. Though it seems to be working, I am not happy with the code. It looks hackish to me. I would really appreciate improving
```
from urlparse import urlparse
supported_vendors = ['flipkart.com', 'homeshop18.com', 'snapdeal.com',
'myntra.com', 'www.flipkart.com', 'www.homeshop18.com',
'www.snapdeal.com', 'www.myntra.com']
test_urls =['http://www.myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'www.myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'https://myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'http://.myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'htt://www.myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'http://blahblah.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'ftp://myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!']
test_expected_results = [True, True, True, False, False, False, False]
def get_vendor(url):
def add_http(url):
return 'http:
Following is the code and also the test cases. Though it seems to be working, I am not happy with the code. It looks hackish to me. I would really appreciate improving
get_vendor function. Do you find it readable? ```
from urlparse import urlparse
supported_vendors = ['flipkart.com', 'homeshop18.com', 'snapdeal.com',
'myntra.com', 'www.flipkart.com', 'www.homeshop18.com',
'www.snapdeal.com', 'www.myntra.com']
test_urls =['http://www.myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'www.myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'https://myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'http://.myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'htt://www.myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'http://blahblah.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'ftp://myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!']
test_expected_results = [True, True, True, False, False, False, False]
def get_vendor(url):
def add_http(url):
return 'http:
Solution
The two helper functions feel redundant to me;
My proposal:
To deal with an optional
'http://'+url is clear enough as is, and there is an urlunparse in urlparse module.My proposal:
def get_vendor(url):
parsed_url = urlparse(url)
if not parsed_url.scheme:
parsed_url = urlparse('http://'+url)
scheme, netloc, path, params, query, fragment = parsed_url
if scheme in ['http', 'https'] and netloc in supported_vendors:
return (netloc, urlunparse((scheme, netloc, path, '','','')))
else:
return (None, url)To deal with an optional
www. in front, you could avoid complicating the get_vendor function by adding it programmatically to each known URL. Now, if you needed to add a vendor that's only available with or without www., you would only have to change this back how it was and add to that list.supported_vendors = ['flipkart.com', 'homeshop18.com', 'snapdeal.com',
'myntra.com']
supported_vendors += ['www.' + x for x in supported_vendors]Code Snippets
def get_vendor(url):
parsed_url = urlparse(url)
if not parsed_url.scheme:
parsed_url = urlparse('http://'+url)
scheme, netloc, path, params, query, fragment = parsed_url
if scheme in ['http', 'https'] and netloc in supported_vendors:
return (netloc, urlunparse((scheme, netloc, path, '','','')))
else:
return (None, url)supported_vendors = ['flipkart.com', 'homeshop18.com', 'snapdeal.com',
'myntra.com']
supported_vendors += ['www.' + x for x in supported_vendors]Context
StackExchange Code Review Q#40612, answer score: 3
Revisions (0)
No revisions yet.