patternpythonMinor
Python script which fetches Amazon product details using its API
Viewed 0 times
scriptproductfetchesitsamazonpythondetailsusingwhichapi
Problem
This is a script which takes an Amazon URL as input, takes out ASIN/ISBN from the URL, and uses Amazon Python API to fetch details.
For this task, I looked at many Amazon URLs and observed the following things:
I hope I covered everything, but if I missed some case, please let me know.
Secondly, I care about only Amazon India. So it throws that ASIN is invalid if I give an ASIN of Amazon US or any other country, or gives pricing info zero. I don't know why it is causing such behaviour (example).
Flow: Get the URL, check for either ASIN or ISBN, if none are present return false else use API to fetch name, price, URL and product image.
```
import re
from amazon.api import AmazonAPI
AMAZON_ACCESS_KEY = 'AKI...JP2'
AMAZON_SECRET_KEY = 'Eto...uxV'
AMAZON_ASSOC_TAG = 'p...1'
asin_regex = r'/([A-Z0-9]{10})'
isbn_regex = r'/([0-9]{10})'
def get_amazon_item_id(url):
# return either ASIN or ISBN
asin_search = re.search(asin_regex, url)
isbn_search = re.search(isbn_regex, url)
if asin_search:
return asin_search.group(1)
elif isbn_search:
return isbn_search.group(1)
else:
# log this URL
return None
def get_amazon_product_meta(url):
# the input URL is always of amazon
amazon = AmazonAPI(AMAZON_ACCESS_KEY, AMAZON_SECRET_KEY, AMAZON_ASSOC_TAG, region="IN")
item_id = get_amazon_item_id(url)
if not item_id:
return None
try:
product = amazon.lookup(ItemId=item_id)
except amazon.api.Asin
For this task, I looked at many Amazon URLs and observed the following things:
- ASIN/ISBN is always present in an Amazon URL
- ASIN/ISBN can be present anywhere in the URL
- ASIN is always in capital letters, consists of alphabets and numbers only and length is always 10
- If ASIN is not present then it's a link to a book on Amazon and ISBN will be present
- ISBN consists only of numbers
- Amazon uses ISBN-10 format i.e. length of ISBN is 10
- Haven't come across ASIN which is purely numbers and without any alphabets
- ASIN or ISBN, always followed by '/'
I hope I covered everything, but if I missed some case, please let me know.
Secondly, I care about only Amazon India. So it throws that ASIN is invalid if I give an ASIN of Amazon US or any other country, or gives pricing info zero. I don't know why it is causing such behaviour (example).
Flow: Get the URL, check for either ASIN or ISBN, if none are present return false else use API to fetch name, price, URL and product image.
```
import re
from amazon.api import AmazonAPI
AMAZON_ACCESS_KEY = 'AKI...JP2'
AMAZON_SECRET_KEY = 'Eto...uxV'
AMAZON_ASSOC_TAG = 'p...1'
asin_regex = r'/([A-Z0-9]{10})'
isbn_regex = r'/([0-9]{10})'
def get_amazon_item_id(url):
# return either ASIN or ISBN
asin_search = re.search(asin_regex, url)
isbn_search = re.search(isbn_regex, url)
if asin_search:
return asin_search.group(1)
elif isbn_search:
return isbn_search.group(1)
else:
# log this URL
return None
def get_amazon_product_meta(url):
# the input URL is always of amazon
amazon = AmazonAPI(AMAZON_ACCESS_KEY, AMAZON_SECRET_KEY, AMAZON_ASSOC_TAG, region="IN")
item_id = get_amazon_item_id(url)
if not item_id:
return None
try:
product = amazon.lookup(ItemId=item_id)
except amazon.api.Asin
Solution
This should be a short review mainly because your code looks pretty nice.
I don't have any experience with the Amazon API, but from what I can tell you are using it as I would expect it to be used.
The only points that I want to speak on and can nit-pick are:
-
Your logic in
The above is strictly personal preference, simply getting another version out there.
However, what I would recommend doing is use a for loop to reduce some of the minor code repetition and make your code more flexible:
The above code would be best if, say, for some reason another ID type came out. So, instead of having to add another if statement, all you need to do is add
-
I would change your regex patterns to use the
The
-
Finally, I would capitalize the first word of all your comments (unless it is a variable name).
I don't have any experience with the Amazon API, but from what I can tell you are using it as I would expect it to be used.
The only points that I want to speak on and can nit-pick are:
- Your variables
asin_regexandisbn_regexare slightly misleading. In Python there is a distinction between regular expression patterns and regular expression objects. The suffix_regeximplies that, by itself, the variable can recognize expressions which, in turn, implies the variable is a regular expression object. Because your variables cannot, by themselves, recognize expressions, I would use the suffix_pattern.
-
Your logic in
get_amazon_id is completely fine. Its my personal preference, in this case where you are simply returning inside the if blocks, to just use if statements, no elif or else blocks:if asin_search:
return asin_search.group(1)
if isbn_search:
return isbn_search.group(1)
return NoneThe above is strictly personal preference, simply getting another version out there.
However, what I would recommend doing is use a for loop to reduce some of the minor code repetition and make your code more flexible:
for search in asin_search, isbn_search:
if search:
return search.group(1)
return NoneThe above code would be best if, say, for some reason another ID type came out. So, instead of having to add another if statement, all you need to do is add
new_id_search to the for loop list.-
I would change your regex patterns to use the
\w and \d characters:asin_regex = r'/(\w{10})'
isbn_regex = r'/(\d{10})'The
\d character would just make your currently short regex even shorter, while the \w character helps protect against the case that for some unknown reason the ASIN or ISBN contains lower-case characters.-
Finally, I would capitalize the first word of all your comments (unless it is a variable name).
Code Snippets
if asin_search:
return asin_search.group(1)
if isbn_search:
return isbn_search.group(1)
return Nonefor search in asin_search, isbn_search:
if search:
return search.group(1)
return Noneasin_regex = r'/(\w{10})'
isbn_regex = r'/(\d{10})'Context
StackExchange Code Review Q#56554, answer score: 7
Revisions (0)
No revisions yet.