patternpythonMinor
Web crawler that charts stock ticker data using matplotlib
Viewed 0 times
tickermatplotlibstockthatusingwebchartscrawlerdata
Problem
I've built a web crawler using the
```
import urllib.request
from matplotlib import pyplot as plt
from bs4 import BeautifulSoup
import requests
def chartStocks(*tickers):
# Run loop for each ticker passed in as an argument
for ticker in tickers:
# Convert URL into text for parsing
url = "http://finance.yahoo.com/q/hp?s=" + str(ticker) + "+Historical+Prices"
sourceCode = requests.get(url)
plainText = sourceCode.text
soup = BeautifulSoup(plainText, "html.parser")
# Find all links on the page
for link in soup.findAll('a'):
href = link.get('href')
link = []
for c in href[:48]:
link.append(c)
link = ''.join(link)
# Find the URL for the stock ticker CSV file and convert the data to text
if link == "http://real-chart.finance.yahoo.com/table.csv?s=":
csv_url = href
res = urllib.request.urlopen(csv_url)
csv = res.read()
csv_str = str(csv)
# Parse the CSV to create a list of data points
point = []
points = []
curDay = 0
day = []
commas = 0
lines = csv_str.split("\\n")
lineOne = True
for line in lines:
commas = 0
if lineOne == True:
lineOne = False
else:
for c in line:
if c == ",":
commas += 1
if commas == 4:
point.append
BeautifulSoup library that pulls stock ticker data from CSV files on Yahoo finance, and charts the data using matplotlib. I'm wondering if there are any ways to improve the code I've written, because there are some parts that I think could be a lot better.```
import urllib.request
from matplotlib import pyplot as plt
from bs4 import BeautifulSoup
import requests
def chartStocks(*tickers):
# Run loop for each ticker passed in as an argument
for ticker in tickers:
# Convert URL into text for parsing
url = "http://finance.yahoo.com/q/hp?s=" + str(ticker) + "+Historical+Prices"
sourceCode = requests.get(url)
plainText = sourceCode.text
soup = BeautifulSoup(plainText, "html.parser")
# Find all links on the page
for link in soup.findAll('a'):
href = link.get('href')
link = []
for c in href[:48]:
link.append(c)
link = ''.join(link)
# Find the URL for the stock ticker CSV file and convert the data to text
if link == "http://real-chart.finance.yahoo.com/table.csv?s=":
csv_url = href
res = urllib.request.urlopen(csv_url)
csv = res.read()
csv_str = str(csv)
# Parse the CSV to create a list of data points
point = []
points = []
curDay = 0
day = []
commas = 0
lines = csv_str.split("\\n")
lineOne = True
for line in lines:
commas = 0
if lineOne == True:
lineOne = False
else:
for c in line:
if c == ",":
commas += 1
if commas == 4:
point.append
Solution
Composing small functions
This approach lets you clearly see the "pipeline" your data is passing through, and lets you test and reuse smaller pieces individually.
Arguably it would be even cleaner to define
The only thing to be careful of here is to make sure to design your functions to handle errors properly -- either check that each return value is not
str.startswith
This:
could be simplified with
I've also provided a not-found value of
Skipping first row
Instead of using a
you can just start from after the first row with a slice:
CSV parsing
Python has a built-in CSV parsing module that could simplify a lot of your parsing. It would do the splitting-by-commas for you and return either a list or dict of fields for each row, depending on what you ask for. You would end up with roughly this:
where the
Actually, since it seems like
chartStocks would be more readable if you split it into a handful of smaller functions, roughly like this:def chartStocks(*tickers):
for ticker in tickers:
page = getTickerPage(ticker)
csv_url = findCSVUrl(page)
csv = getCSV(csv_url)
day, points = parseCSV(csv)
plot_data(ticker, day, points)
# Or, if you're allergic to temporary variables:
day, points = parseCSV(getCSV(findCSVUrl(getTickerPage(ticker))))This approach lets you clearly see the "pipeline" your data is passing through, and lets you test and reuse smaller pieces individually.
Arguably it would be even cleaner to define
def chartStock(ticker) to handle the case of one ticker, so that chartStocks is justdef chartStocks(*tickers):
for ticker in tickers:
chartStock(ticker)The only thing to be careful of here is to make sure to design your functions to handle errors properly -- either check that each return value is not
None before calling the next function, or allow them to take None as a parameter and just return nothing in that case.str.startswith
This:
# Find all links on the page
for link in soup.findAll('a'):
href = link.get('href')
link = []
for c in href[:48]:
link.append(c)
link = ''.join(link)
# Find the URL for the stock ticker CSV file and convert the data to text
if link == "http://real-chart.finance.yahoo.com/table.csv?s=":
# ...could be simplified with
str.startswith:def findCSVUrl(soupPage):
CSV_URL_PREFIX = 'http://real-chart.finance.yahoo.com/table.csv?s='
for link in soupPage.findAll('a'):
href = link.get('href', '')
if href.startswith(CSV_URL_PREFIX):
return hrefI've also provided a not-found value of
'' so that if link has no href, startswith won't get called on None.Skipping first row
Instead of using a
lineOne flag when looping over lines:lineOne = True
for line in lines:
if lineOne == True:
lineOne = False
else:
# continue parsing line...you can just start from after the first row with a slice:
for line in lines[1:]:
# ... continue parsing lineCSV parsing
Python has a built-in CSV parsing module that could simplify a lot of your parsing. It would do the splitting-by-commas for you and return either a list or dict of fields for each row, depending on what you ask for. You would end up with roughly this:
def parseCSV(csv_text):
csv_rows = csv.reader(csv_text.split('\n'))
days = []
points = []
for day, row in enumerate(csv_rows):
close = float(row[4])
days.append(day)
points.append(close)
return days, pointswhere the
enumerate function would give you the same zero-based days list as you currently have.Actually, since it seems like
days will just be the list [0 .. len(points)], you could skip the enumerate and simply define days after you've parsed all the points, and toss in a list comprehension if you'd like for good measure:def parseCSV(csv_text):
csv_rows = csv.reader(csv_text.split('\n'))
points = [float(row[4]) for row in csv_rows]
days = list(range(len(points)))
return days, pointsCode Snippets
def chartStocks(*tickers):
for ticker in tickers:
page = getTickerPage(ticker)
csv_url = findCSVUrl(page)
csv = getCSV(csv_url)
day, points = parseCSV(csv)
plot_data(ticker, day, points)
# Or, if you're allergic to temporary variables:
day, points = parseCSV(getCSV(findCSVUrl(getTickerPage(ticker))))def chartStocks(*tickers):
for ticker in tickers:
chartStock(ticker)# Find all links on the page
for link in soup.findAll('a'):
href = link.get('href')
link = []
for c in href[:48]:
link.append(c)
link = ''.join(link)
# Find the URL for the stock ticker CSV file and convert the data to text
if link == "http://real-chart.finance.yahoo.com/table.csv?s=":
# ...def findCSVUrl(soupPage):
CSV_URL_PREFIX = 'http://real-chart.finance.yahoo.com/table.csv?s='
for link in soupPage.findAll('a'):
href = link.get('href', '')
if href.startswith(CSV_URL_PREFIX):
return hreflineOne = True
for line in lines:
if lineOne == True:
lineOne = False
else:
# continue parsing line...Context
StackExchange Code Review Q#114612, answer score: 4
Revisions (0)
No revisions yet.