patternpythonMinor
I'll visit the 18th
Viewed 0 times
18ththevisit
Problem
I wrote this program, which purpose is to visit the 18th link on the list of links and then on the new page visit the 18th link again.
This program works as intended, but it's a little repetitive and inelegant.
I was wondering if you have any ideas on how to make it simpler, without using any functions. If I wanted to repeat the process 10 or 100 times, this would become very long.
This program works as intended, but it's a little repetitive and inelegant.
I was wondering if you have any ideas on how to make it simpler, without using any functions. If I wanted to repeat the process 10 or 100 times, this would become very long.
import urllib
from BeautifulSoup import *
url = raw_input('Enter - ')
if len(url) < 1 :
url='http://python-data.dr-chuck.net/known_by_Oluwanifemi.html'
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
# Retrieve all of the anchor tags
tags = soup('a')
urllist = list()
count = 0
loopcount = 0
for tag in tags:
count = count + 1
tg = tag.get('href', None)
if count == 18:
print count, tg
urllist.append(tg)
url2 = (urllist[0])
html2 = urllib.urlopen(url2).read()
soup2 = BeautifulSoup(html2)
tags2 = soup2('a')
count2 = 0
for tag2 in tags2:
count2 = count2 + 1
tg2 = tag2.get('href', None)
if count2 == 18:
print count2, tg2
urllist.append(tg2)Solution
If you are only interested in getting the 18th url from the initial one and then the 18th again there is no reason to go through all of them and count iterations and so on. You can simply access it directly using the indexes. On this computer i do not have BeautifulSoup installed but try this:
Should be as simple as that.
import urllib
from BeautifulSoup import *
url_1 = input('') or 'http://python-data.dr-chuck.net/known_by_Oluwanifemi.html'
html_1 = urllib.urlopen(url_1).read()
soup_1 = BeautifulSoup(html_1)
tags = soup('a')
url_retr1 = tags[17].get('href', None)
html_2 = urllib.urlopen(url_retr1).read()
soup_2 = BeautifulSoup(html_2)
tags_2 = soup_2('a')
url_retr1 = tags_2[17].get('href', None)Should be as simple as that.
Code Snippets
import urllib
from BeautifulSoup import *
url_1 = input('') or 'http://python-data.dr-chuck.net/known_by_Oluwanifemi.html'
html_1 = urllib.urlopen(url_1).read()
soup_1 = BeautifulSoup(html_1)
tags = soup('a')
url_retr1 = tags[17].get('href', None)
html_2 = urllib.urlopen(url_retr1).read()
soup_2 = BeautifulSoup(html_2)
tags_2 = soup_2('a')
url_retr1 = tags_2[17].get('href', None)Context
StackExchange Code Review Q#132306, answer score: 6
Revisions (0)
No revisions yet.