patternpythonMinor
Session handling using Python Requests client
Viewed 0 times
handlingclientpythonusingsessionrequests
Problem
I'm using this code to login to an experiment login system created by me for this purpose.
```
import requests
import re
def get_page_data(regex, req):
match = re.compile(regex).search(req.text)
if match != None:
return match.group(1)
return 'no match found on {}'.format(regex)
def print_req_data(req, req_name):
print('{} status code: {}'.format(req_name, req.status_code))
print('{} title: {}'.format(req_name, get_page_data('(.*?)', req)))
print('{} content:\n{}'.format(req_name, req.text))
form_url = 'http://migueldvl.com/heya/login/'
post_url = 'http://migueldvl.com/heya/login/process.php'
headers = {'User-agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1'}
with requests.Session() as s:
# get form page, so the cookies are set (form token)
logged_out_req = s.get(form_url, headers=headers)
print_req_data(logged_out_req, 'Logged out request')
title_logout = get_page_data('(.*?)', logged_out_req)
# getting the form token
token = get_page_data('', logged_out_req)
login_data = {'password' : 'password', 'username' : 'miguel', 'token': token}
# posting the data to the post url
post_req = s.post(post_url, data=login_data, headers=headers)
print_req_data(post_req, 'Post request (redirect page)')
check_post_title = get_page_data('(.*?)', post_req)
# comparing the titles (logged out title with redirect page title) so i see if login success
if(check_post_title != title_logout):
print('SUCCESS\n[+] Checking if still logged in...')
# going to see if i'm still loggin, see if our loggedin session is permanent
logged_req = s.get(form_url, headers=headers)
print_req_data(logged_req, 'Check if still logged request')
title_check = get_page_data('(.*?)', logged_req)
if(title_check == check_post_title):
print('You are still loogedin')
else:
print('Not loggeding anymore')
el
```
import requests
import re
def get_page_data(regex, req):
match = re.compile(regex).search(req.text)
if match != None:
return match.group(1)
return 'no match found on {}'.format(regex)
def print_req_data(req, req_name):
print('{} status code: {}'.format(req_name, req.status_code))
print('{} title: {}'.format(req_name, get_page_data('(.*?)', req)))
print('{} content:\n{}'.format(req_name, req.text))
form_url = 'http://migueldvl.com/heya/login/'
post_url = 'http://migueldvl.com/heya/login/process.php'
headers = {'User-agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1'}
with requests.Session() as s:
# get form page, so the cookies are set (form token)
logged_out_req = s.get(form_url, headers=headers)
print_req_data(logged_out_req, 'Logged out request')
title_logout = get_page_data('(.*?)', logged_out_req)
# getting the form token
token = get_page_data('', logged_out_req)
login_data = {'password' : 'password', 'username' : 'miguel', 'token': token}
# posting the data to the post url
post_req = s.post(post_url, data=login_data, headers=headers)
print_req_data(post_req, 'Post request (redirect page)')
check_post_title = get_page_data('(.*?)', post_req)
# comparing the titles (logged out title with redirect page title) so i see if login success
if(check_post_title != title_logout):
print('SUCCESS\n[+] Checking if still logged in...')
# going to see if i'm still loggin, see if our loggedin session is permanent
logged_req = s.get(form_url, headers=headers)
print_req_data(logged_req, 'Check if still logged request')
title_check = get_page_data('(.*?)', logged_req)
if(title_check == check_post_title):
print('You are still loogedin')
else:
print('Not loggeding anymore')
el
Solution
Please don't use regexes
Regular expressions are power tools. You can use them for many situations, but they tend to be inherently fragile. They're especially fragile for parsing HTML, in part because of how lenient most browsers will be. I'd recommend using a dedicated parser for it. If you decided that since this is a page you control you don't need to be that careful that's fine, just be aware.
You shouldn't execute any code in the main body of your program - anything that should execute should be inside of an
Don't return a value if nothing was found; this would be very unexpected (and it would be annoying to check for, with a few edge cases). Instead raise an appropriate exception.
You also don't need to do
Favor good names over comments
Comments are almost always a sin. A necessary sin, but they mean that you haven't been able to sufficiently express yourself in code. Most of your comments would be great function names, e.g.
becomes
Also you desperately need whitespace - add newlines liberally to group like sections of code. This is also a good way to see where you should create a new function.
I've left out the printing, because...
Don't print willy nilly
Most of the time you have a bunch of print statements they are intended to be debugging output; remove that before you actually use this. Occasionally if you need some sort of admin script or you need to know the status that's fine, but make sure you're printing out the bare minimum of what you need, and a log file is still going to be better.
I ended up with something like this. A lot of the functions end up not doing a whole lot, and if you don't think they're adding a lot of value remove them - the point of them, however, is to make it clear what each step is intended to do. You'll note that I never do any error handling - I don't know what you actually want to be doing when the match fails.
Regular expressions are power tools. You can use them for many situations, but they tend to be inherently fragile. They're especially fragile for parsing HTML, in part because of how lenient most browsers will be. I'd recommend using a dedicated parser for it. If you decided that since this is a page you control you don't need to be that careful that's fine, just be aware.
mainYou shouldn't execute any code in the main body of your program - anything that should execute should be inside of an
if __name__ == '__main__': block, and that should still be broken up into functions for ease of use.get_page_dataDon't return a value if nothing was found; this would be very unexpected (and it would be annoying to check for, with a few edge cases). Instead raise an appropriate exception.
You also don't need to do
if match != None:, just do if match:class NoMatchFoundException(Exception): pass
def get_page_data(regex, req):
match = re.compile(regex).search(req.text)
if match:
return match.group(1)
raise NoMatchFoundException("No match found on {}".format(regex))Favor good names over comments
Comments are almost always a sin. A necessary sin, but they mean that you haven't been able to sufficiently express yourself in code. Most of your comments would be great function names, e.g.
# get form page, so the cookies are set (form token)
logged_out_req = s.get(form_url, headers=headers)
print_req_data(logged_out_req, 'Logged out request')
title_logout = get_page_data('(.*?)', logged_out_req)becomes
def get_form_page(session, url, headers):
return session.get(url, headers=headers)
def get_page_title(page):
return get_page_data('(.*?)', page)
logged_out_req = get_form_page(s, form_url, headers)
logout_page_title = get_page_title(logged_out_req)Also you desperately need whitespace - add newlines liberally to group like sections of code. This is also a good way to see where you should create a new function.
I've left out the printing, because...
Don't print willy nilly
Most of the time you have a bunch of print statements they are intended to be debugging output; remove that before you actually use this. Occasionally if you need some sort of admin script or you need to know the status that's fine, but make sure you're printing out the bare minimum of what you need, and a log file is still going to be better.
I ended up with something like this. A lot of the functions end up not doing a whole lot, and if you don't think they're adding a lot of value remove them - the point of them, however, is to make it clear what each step is intended to do. You'll note that I never do any error handling - I don't know what you actually want to be doing when the match fails.
import requests
import re
class NoMatchFoundException(Exception): pass
def get_page_data(regex, req):
match = re.compile(regex).search(req.text)
if match:
return match.group(1)
raise NoMatchFoundException("No match found on {}".format(regex))
def get_page_title(page):
return get_page_data("(.*?)", page)
def get_page(session, url, headers, data=None):
data = data if data else {}
return session.get(url, headers=headers, data=data)
def print_req_data(req, req_name):
print('{} status code: {}'.format(req_name, req.status_code))
print('{} title: {}'.format(req_name, get_page_title(req)))
print('{} content:\n{}'.format(req_name, req.text))
def get_form_token(page):
return get_page_data('', page)
def logged_in(logout, logged):
return get_page_title(logout) != get_page_title(logged)
def same_page(first, second):
return get_page_title(first) == get_page_title(second)
def test_login(form_url, post_url, headers, username, password):
with requests.Session() as s:
logout_page = get_page(s, form_url, headers)
form_token = get_form_token(logout_page)
login_data = {'password' : password, 'username' : username 'token': form_token}
logged_in_page = get_page(s, post_url, headers, login_data)
if logged_in(logout_page, logged_in_page):
logged_in_test = get_page(s, form_url, headers)
if same_page(logged_in_test, logged_in_page):
print('You are still logged-in')
else:
print('Not logged-in anymore')
else:
print('FAIL LOGIN')
print_req_data(post_req, 'Post request (redirect page)')
if __name__ == '__main__':
form_url = 'http://migueldvl.com/heya/login/'
post_url = 'http://migueldvl.com/heya/login/process.php'
headers = {'User-agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1'}
username = 'miguel'
password = 'password'
test_login(form_url, post_url, headers, username, password)Code Snippets
class NoMatchFoundException(Exception): pass
def get_page_data(regex, req):
match = re.compile(regex).search(req.text)
if match:
return match.group(1)
raise NoMatchFoundException("No match found on {}".format(regex))# get form page, so the cookies are set (form token)
logged_out_req = s.get(form_url, headers=headers)
print_req_data(logged_out_req, 'Logged out request')
title_logout = get_page_data('<title>(.*?)</title>', logged_out_req)def get_form_page(session, url, headers):
return session.get(url, headers=headers)
def get_page_title(page):
return get_page_data('<title>(.*?)</title>', page)
logged_out_req = get_form_page(s, form_url, headers)
logout_page_title = get_page_title(logged_out_req)import requests
import re
class NoMatchFoundException(Exception): pass
def get_page_data(regex, req):
match = re.compile(regex).search(req.text)
if match:
return match.group(1)
raise NoMatchFoundException("No match found on {}".format(regex))
def get_page_title(page):
return get_page_data("<title>(.*?)</title>", page)
def get_page(session, url, headers, data=None):
data = data if data else {}
return session.get(url, headers=headers, data=data)
def print_req_data(req, req_name):
print('{} status code: {}'.format(req_name, req.status_code))
print('{} title: {}'.format(req_name, get_page_title(req)))
print('{} content:\n{}'.format(req_name, req.text))
def get_form_token(page):
return get_page_data('<input type="hidden" name="token" value="(.*?)">', page)
def logged_in(logout, logged):
return get_page_title(logout) != get_page_title(logged)
def same_page(first, second):
return get_page_title(first) == get_page_title(second)
def test_login(form_url, post_url, headers, username, password):
with requests.Session() as s:
logout_page = get_page(s, form_url, headers)
form_token = get_form_token(logout_page)
login_data = {'password' : password, 'username' : username 'token': form_token}
logged_in_page = get_page(s, post_url, headers, login_data)
if logged_in(logout_page, logged_in_page):
logged_in_test = get_page(s, form_url, headers)
if same_page(logged_in_test, logged_in_page):
print('You are still logged-in')
else:
print('Not logged-in anymore')
else:
print('FAIL LOGIN')
print_req_data(post_req, 'Post request (redirect page)')
if __name__ == '__main__':
form_url = 'http://migueldvl.com/heya/login/'
post_url = 'http://migueldvl.com/heya/login/process.php'
headers = {'User-agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1'}
username = 'miguel'
password = 'password'
test_login(form_url, post_url, headers, username, password)Context
StackExchange Code Review Q#135518, answer score: 4
Revisions (0)
No revisions yet.