HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Trivago hotels price checker

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
pricetrivagocheckerhotels

Problem

I've decided to write my first project in Python. I would like to hear some opinion from you.

Description of the script:

  • Generate Trivago URLs for 5 star hotels in specified city.



  • Scrape these URLs to get prices.



  • Save results as SQL queries in text file



  • Execute queries from text file to upload results to database.



Settings:

  • The date from which the script is going to check prices.



  • Number of days to check.



  • City to check.



  • Name of file with results of the script.



I wrote this script to learn how website scraping in Python works, and how to use SQL. I also wanted to learn something about object-oriented programming.

```
__author__ = ''

import datetime
import pymysql
import lxml.html as lh
import re
import sys
from selenium import webdriver

class TrivagoPriceChecker():
from_year = ''
from_month = ''
from_day = ''
days_number = ''
city_id = ''
hotel_id = ''
result_file = ''
browser = webdriver.PhantomJS()

def __init__(self):
print("Trivago Price Checker ver 1.0")

def generate_url(self):
from_date = datetime.date(int(self.from_year), int(self.from_month), int(self.from_day))
to_date = datetime.date(int(self.from_year),int(self.from_month),int(self.from_day)) + datetime.timedelta(days=int(self.days_number))
url_list = []
while(from_date < to_date):
day_plus = from_date + datetime.timedelta(days=1)
url = 'http://www.trivago.pl/?aDateRange%5Barr%5D=' + str(from_date) + '&aDateRange%5Bdep%5D=' + str(day_plus) + '&iRoomType=7&iPathId=' + str(self.city_id) + '&iGeoDistanceItem=' + str(self.hotel_id) + '&iViewType=0&bIsSeoPage=false&bIsSitemap=false&'
url_list.append(url)
from_date += datetime.timedelta(days=1)
return url_list

def get_hotel_price(self, hotel_url):
self.browser.get(hotel_url)
content = self.browser.page_source
website = lh.fromstring(content)
for price in website.

Solution

try:
    query.execute(line)
    progress += 1
    print(progress)
except:
    pass


Here you used a bare (with nothing next to it) except, this is considered bad practice because if an unexpected error happens, it will be silently ignored, instead use;

except TheExceptionIThinkWillHappen:


print("City not found!")
exit()


This looks like an error, please use:

raise ValueError("City not found")


Possible bug
---

for price in website.xpath('//*[@id="js_item_' + str(self.hotel_id) + '"]/div[1]/div[2]/div[2]/strong[2]'):
return price.text

This returns only the first price because return terminates the function, is this intended or a bug? (If you want all the prices you should use yield).

If you want to follow PEP8 remember to separate the standard library imports from the third library imports, so you should write:

# The other imports

from selenium import webdriver
import pymysql


This allows the users of your script to check quickly if they have or not the required 3-rd part modeles.

url_list = []
    while(from_date < to_date):
        day_plus = from_date + datetime.timedelta(days=1)
        url = 'http://www.trivago.pl/?aDateRange%5Barr%5D=' + str(from_date) + '&aDateRange%5Bdep%5D=' + str(day_plus) + '&iRoomType=7&iPathId=' + str(self.city_id) + '&iGeoDistanceItem=' + str(self.hotel_id) + '&iViewType=0&bIsSeoPage=false&bIsSitemap=false&'
        url_list.append(url)
        from_date += datetime.timedelta(days=1)


In general the while loop should be avoided, I would use a for loop:

url = "http://www.trivago.pl/?aDateRange%5Barr%5D={}&aDateRange%5Bdep%5D={}&iRoomType=7&iPathId={}&iGeoDistanceItem={}&iViewType=0&bIsSeoPage=false&bIsSitemap=false&"
url_list = []
day = datetime.timedelta(days=1)
for date in range(from_date,to_date,day):
    url_list.append(url.format(date,date + day,
        self.city_id,self.hotel_id))


or even (but you may not want to compress so much):

return [url.format(date,date + day,
    self.city_id,self.hotel_id) for date in range(from_date,to_date,day)]


for price in website.xpath('//*[@id="js_item_' + str(self.hotel_id) + '"]/div[1]/div[2]/div[2]/strong[2]'):
    return price.text


return is executed only once but the for loops usually does a thing many times, I would have written:

prices = [ price.text for price in website.xpath('//*[@id="js_item_' + str(self.hotel_id) + '"]/div[1]/div[2]/div[2]/strong[2]')]
return prices[0]


so that the reader immediately understands that you want to return only the first item.

Veedrac noted that the solution above is too slow, I think the best thing to do here is just adding a comment to make explicit that you only take one item

for price in website.xpath('//*[@id="js_item_' + str(self.hotel_id) + '"]/div[1]/div[2]/div[2]/strong[2]'):
    # Return the first price found ONLY
    return price.text


file = open(self.result_file, "a")


This line contains a problem:

  1. file is a built-in never re-assign (give another value to) a built-in as a comment noted, in Python 3 it is not a built-in so don't worry about shadowing it.



  • Opening and closing manually is obsolete, you should use



:

with open(self.result_file,"a") as f:
    # some code
    f.write(value)


Closing is handled automatically so you will not be in danger of forgetting it.

Code Snippets

try:
    query.execute(line)
    progress += 1
    print(progress)
except:
    pass
except TheExceptionIThinkWillHappen:
print("City not found!")
exit()
raise ValueError("City not found")
# The other imports

from selenium import webdriver
import pymysql

Context

StackExchange Code Review Q#77324, answer score: 5

Revisions (0)

No revisions yet.