HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Scraping data from a table in python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
scrapingpythonfromdatatable

Problem

I'm new to python, and after doing a few tutorials, some about scraping, I've been trying some simple scraping on my own. Using BeautifulSoup I manage to get data from web pages where everything has labels, but without them I'm doing a poor job.

I'm trying to get the dollar exchange rate from:
http://www.bancochile.cl/cgi-bin/cgi_mone?pagina=inversiones/mon_tasa/cgi_mone

The value I'm after is highlighted in yellow.

After a lot of trial and error, I manage to get the dollar exchange rate, but I think there has to be a better way.

import requests
from bs4 import BeautifulSoup

page = requests.get("http://www.bancochile.cl/cgi-bin /cgi_mone?pagina=inversiones/mon_tasa/cgi_mone")
soup = BeautifulSoup(page.content, 'html.parser')

tables = soup.find_all("table")
dollar = tables[4].find_all("td")

print(dollar[5].string)


Is there a better, or more correct way to do this? Also, I'm not sure if the problem is in the way I coded, or in not being able to better understand the HTML structure, to navigate to the information in a more efficient way.

Solution

The markup is definitely not easy to parse because of the nested table elements with no meaningful attributes. But, you are right that relying on relative index of a table and the desired cell being the 6th in the table is quite a fragile strategy.

Instead, let's use the row title as our "anchor". Then, we'll get the following cell via the .find_next_sibling():

DESIRED_MONEDAS = "DOLAR USA"

label = soup.find(lambda tag: tag.name == "td" and tag.get_text(strip=True) == DESIRED_MONEDAS)
value = label.find_next_sibling("td").get_text(strip=True)
print(value)

Code Snippets

DESIRED_MONEDAS = "DOLAR USA"

label = soup.find(lambda tag: tag.name == "td" and tag.get_text(strip=True) == DESIRED_MONEDAS)
value = label.find_next_sibling("td").get_text(strip=True)
print(value)

Context

StackExchange Code Review Q#159383, answer score: 3

Revisions (0)

No revisions yet.