patternpythonMinor
Scraping data from a table in python
Viewed 0 times
scrapingpythonfromdatatable
Problem
I'm new to python, and after doing a few tutorials, some about scraping, I've been trying some simple scraping on my own. Using BeautifulSoup I manage to get data from web pages where everything has labels, but without them I'm doing a poor job.
I'm trying to get the dollar exchange rate from:
http://www.bancochile.cl/cgi-bin/cgi_mone?pagina=inversiones/mon_tasa/cgi_mone
The value I'm after is highlighted in yellow.
After a lot of trial and error, I manage to get the dollar exchange rate, but I think there has to be a better way.
Is there a better, or more correct way to do this? Also, I'm not sure if the problem is in the way I coded, or in not being able to better understand the HTML structure, to navigate to the information in a more efficient way.
I'm trying to get the dollar exchange rate from:
http://www.bancochile.cl/cgi-bin/cgi_mone?pagina=inversiones/mon_tasa/cgi_mone
The value I'm after is highlighted in yellow.
After a lot of trial and error, I manage to get the dollar exchange rate, but I think there has to be a better way.
import requests
from bs4 import BeautifulSoup
page = requests.get("http://www.bancochile.cl/cgi-bin /cgi_mone?pagina=inversiones/mon_tasa/cgi_mone")
soup = BeautifulSoup(page.content, 'html.parser')
tables = soup.find_all("table")
dollar = tables[4].find_all("td")
print(dollar[5].string)Is there a better, or more correct way to do this? Also, I'm not sure if the problem is in the way I coded, or in not being able to better understand the HTML structure, to navigate to the information in a more efficient way.
Solution
The markup is definitely not easy to parse because of the nested
Instead, let's use the row title as our "anchor". Then, we'll get the following cell via the
table elements with no meaningful attributes. But, you are right that relying on relative index of a table and the desired cell being the 6th in the table is quite a fragile strategy.Instead, let's use the row title as our "anchor". Then, we'll get the following cell via the
.find_next_sibling():DESIRED_MONEDAS = "DOLAR USA"
label = soup.find(lambda tag: tag.name == "td" and tag.get_text(strip=True) == DESIRED_MONEDAS)
value = label.find_next_sibling("td").get_text(strip=True)
print(value)Code Snippets
DESIRED_MONEDAS = "DOLAR USA"
label = soup.find(lambda tag: tag.name == "td" and tag.get_text(strip=True) == DESIRED_MONEDAS)
value = label.find_next_sibling("td").get_text(strip=True)
print(value)Context
StackExchange Code Review Q#159383, answer score: 3
Revisions (0)
No revisions yet.