HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Using Scrapy/Xpath to scrape ESPN for football (soccer) commentaries

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
soccerxpathcommentariesscrapescrapyfootballusingforespn

Problem

The class takes an input of a game-id from the ESPN soccer website. The code then has to grab the commentary, process that and also grab the player names/ids and create a small dictionary of those.

I understand the premise of OOP when shown in basic examples but I cannot get my head around using it in actual programs/applications. I have tried here, but to me it seems to not achieve anything extra (so I must be doing it wrong).

I end up with a list of lists containing each of the key events (corners, shots etc) and a dictionary of players, so it works. But it just seems so inelegant and I would love some tips on improving it (however small/focused they are).

```
import urllib2
from scrapy.selector import HtmlXPathSelector
import re
import codecs
import timeit
start = timeit.default_timer()

class game:
def __init__(self,game_id):
self.game_id = game_id
self.comms_url = 'http://espnfc.com/uk/en/gamecast/%d/gamecast.html?soccernet=true&cc=5739' % self.game_id
self.data_text = urllib2.urlopen(self.comms_url).read()
self.corners = []
self.shots = []
self.fouls = []
self.shots_on_target = []
self.offsides = []
self.goals = []
self.extractCommentaries()
self.extractPlayers()
self.extractTeams()
self.findAction('^Corner', self.corners)
self.findAction('^Foul by', self.fouls)
self.findAction('^Attempt', self.shots)
self.findAction('^Attempt saved', self.shots_on_target)
self.findAction('^Offside', self.offsides)
self.findAction('^Goal!', self.goals)
def extractCommentaries(self):
self.hxs = HtmlXPathSelector(text=self.data_text)
self.match_comments = self.hxs.select("//div[@id='convo-window']/ul[@id='convo-list']/li/div[@class='comment']/p/text()")
self.match_timestamps = self.hxs.select("//div[@id='convo-window']/ul[@id='convo-list']/li/div[@class='timestamp']/p/text()")
self.events = zip

Solution


  • There are no docstrings, so a user of your class is left wondering which methods could be useful to call. In fact, after studying the code, it looks like all the methods are meant to be called by __init__, in a specific order. Makes me wonder if this should be a class at all.



  • You set very many instance attributes, even a self.dummy, but use very few outside the method where they are set. Most of them should be local variables instead. Again, I'm thinking that you don't need a class -- all the methods could be simple functions that take one or two parameters instead.



  • The purpose of the class seems to be to initialize six lists, eg. self.corners. So, to get an overview of the game, one has to access six variables. It would be better to use a single data structure that can be easily looped over and queried. Perhaps a dictionary with keys like "corners" could suit you, but I'm thinking that most useful would be to keep the events in a chronological list and provide a function to filter that list by type of event.

Context

StackExchange Code Review Q#35830, answer score: 5

Revisions (0)

No revisions yet.