HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

I'll visit the 18th

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
18ththevisit

Problem

I wrote this program, which purpose is to visit the 18th link on the list of links and then on the new page visit the 18th link again.

This program works as intended, but it's a little repetitive and inelegant.

I was wondering if you have any ideas on how to make it simpler, without using any functions. If I wanted to repeat the process 10 or 100 times, this would become very long.

import urllib
from BeautifulSoup import *

url = raw_input('Enter - ')
if len(url) < 1 :
    url='http://python-data.dr-chuck.net/known_by_Oluwanifemi.html'
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)

# Retrieve all of the anchor tags
tags = soup('a')
urllist = list()
count = 0
loopcount = 0
for tag in tags:
    count = count + 1
    tg = tag.get('href', None)
    if count == 18:
        print count, tg
        urllist.append(tg)

url2 = (urllist[0])
html2 = urllib.urlopen(url2).read()
soup2 = BeautifulSoup(html2)

tags2 = soup2('a')
count2 = 0
for tag2 in tags2:
    count2 = count2 + 1
    tg2 = tag2.get('href', None)
    if count2 == 18:
        print count2, tg2
        urllist.append(tg2)

Solution

If you are only interested in getting the 18th url from the initial one and then the 18th again there is no reason to go through all of them and count iterations and so on. You can simply access it directly using the indexes. On this computer i do not have BeautifulSoup installed but try this:

import urllib
from BeautifulSoup import *

url_1 = input('') or 'http://python-data.dr-chuck.net/known_by_Oluwanifemi.html'

html_1 = urllib.urlopen(url_1).read()
soup_1 = BeautifulSoup(html_1)

tags = soup('a')
url_retr1 = tags[17].get('href', None)

html_2 = urllib.urlopen(url_retr1).read()
soup_2 = BeautifulSoup(html_2)

tags_2 = soup_2('a')
url_retr1 = tags_2[17].get('href', None)


Should be as simple as that.

Code Snippets

import urllib
from BeautifulSoup import *

url_1 = input('') or 'http://python-data.dr-chuck.net/known_by_Oluwanifemi.html'

html_1 = urllib.urlopen(url_1).read()
soup_1 = BeautifulSoup(html_1)

tags = soup('a')
url_retr1 = tags[17].get('href', None)

html_2 = urllib.urlopen(url_retr1).read()
soup_2 = BeautifulSoup(html_2)

tags_2 = soup_2('a')
url_retr1 = tags_2[17].get('href', None)

Context

StackExchange Code Review Q#132306, answer score: 6

Revisions (0)

No revisions yet.