HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Finding e-commerce site of the product URL given and see if it is supported

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
supportedtheproductseesitecommercefindingandgivenurl

Problem

I am building a Python app in which user can maintain wishlist of the products. I only support few e-commerce sites and do not support country specific sites (e.g I may support amazon.com but not amazon.in). I do not support mobile version of the URL (e.g. http://m.amazon.com) and also I am not interested in the query string part of the URL (and I don't want it also).

Following is the code and also the test cases. Though it seems to be working, I am not happy with the code. It looks hackish to me. I would really appreciate improving get_vendor function. Do you find it readable?

```
from urlparse import urlparse

supported_vendors = ['flipkart.com', 'homeshop18.com', 'snapdeal.com',
'myntra.com', 'www.flipkart.com', 'www.homeshop18.com',
'www.snapdeal.com', 'www.myntra.com']

test_urls =['http://www.myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'www.myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'https://myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'http://.myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'htt://www.myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'http://blahblah.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!',
'ftp://myntra.com/sports-shoes/puma/puma-men-grey-kuris-ii-ind-running-shoes/107455/buy?searchQuery=sports-shoes&serp=1&uq=false#!']

test_expected_results = [True, True, True, False, False, False, False]

def get_vendor(url):
def add_http(url):
return 'http:

Solution

The two helper functions feel redundant to me; 'http://'+url is clear enough as is, and there is an urlunparse in urlparse module.

My proposal:

def get_vendor(url):
    parsed_url = urlparse(url)
    if not parsed_url.scheme:
        parsed_url = urlparse('http://'+url)

    scheme, netloc, path, params, query, fragment = parsed_url

    if scheme in ['http', 'https'] and netloc in supported_vendors:
        return (netloc, urlunparse((scheme, netloc, path, '','','')))
    else:
        return (None, url)


To deal with an optional www. in front, you could avoid complicating the get_vendor function by adding it programmatically to each known URL. Now, if you needed to add a vendor that's only available with or without www., you would only have to change this back how it was and add to that list.

supported_vendors = ['flipkart.com', 'homeshop18.com', 'snapdeal.com', 
                     'myntra.com']
supported_vendors += ['www.' + x for x in supported_vendors]

Code Snippets

def get_vendor(url):
    parsed_url = urlparse(url)
    if not parsed_url.scheme:
        parsed_url = urlparse('http://'+url)

    scheme, netloc, path, params, query, fragment = parsed_url

    if scheme in ['http', 'https'] and netloc in supported_vendors:
        return (netloc, urlunparse((scheme, netloc, path, '','','')))
    else:
        return (None, url)
supported_vendors = ['flipkart.com', 'homeshop18.com', 'snapdeal.com', 
                     'myntra.com']
supported_vendors += ['www.' + x for x in supported_vendors]

Context

StackExchange Code Review Q#40612, answer score: 3

Revisions (0)

No revisions yet.