HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Connect to streaming Twitter API, parse tweets, and write to CSV in real-time

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
realconnectcsvwriteparsetimestreamingandtwitterapi

Problem

I have been using Tweeepy to connect to the Twitter Streaming API to collect tweets, parse that data, and then write select fields to a CSV file. Based on some examples that I found, I put together the following code to manage that connection. One thing that I had to work around was how to handle the connection getting killed. I was able to include an while-loop that properly handles exceptions and restarts the connection if needed.

I want to make sure that this code is optimized and that I'm not including things that might not be needed.

```
#!/usr/bin/env python

import logging
import time
import csv
import json
import tweepy
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy.streaming import StreamListener
from datetime import datetime
from dateutil import parser

# enable logging
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s %(module)s - %(funcName)s: %(message)s',
datefmt="%Y-%m-%d %H:%M:%S")
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# authorize the app to access Twitter on our behalf
consumer_key = ' '
consumer_secret = ' '
access_token = ' '
access_secret = ' '
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)

# establish open connection to streaming API
class MyListener(StreamListener):

def on_data(self, data):
try:
tweet = parse_tweet(data)
content = extract_content(tweet)
with open('tweets.csv', 'a') as f:
writer = csv.writer(f, quotechar = '"')
writer.writerow(content)
#logger.info(content[3])

except BaseException as e:
logger.warning(e)

return True

def on_error(self, status):
logger.warning(status)
return True

# parse data
def parse_tweet(data):

# load JSON item into a dict
tweet = json.loads(data)

# check i

Solution

MyListener is not a very descriptive name. Maybe CSVWriter would be a better one. And that name already suggests something else, that it should take the file name as a parameter:

class CSVWriter(StreamListener):
    def __init__(self, file_name, *args, **kwargs):
        self.file_name = file_name
        StreamListener.__init__(*args, **kwargs)

    def on_data(self, data):
        try:
            tweet = parse_tweet(data)
            content = extract_content(tweet)
            with open(self.file_name, 'a') as f:
                writer = csv.writer(f, quotechar = '"')
                writer.writerow(content)
        except Exception as e:
            logger.warning(e)
        return True


It is also enough to except Exception. BaseException will catch a bit more than you would probably want, namely KeyboardInterrupt (so you can press Ctrl-C to stop the program).
...

twitter_stream = Stream(auth, CSVWriter("tweets.py"))


There seems to be no need to first save tweet['CREATED_AT'] = parser.parse(tweet['created_at']) and then later do tweet['CREATED_AT'].strftime('%Y-%m-%d %H:%M:%S'). Just do the correct conversion at the end (I don't know which one exactly, because I don't know what format it is in, orginally).

I would also reduce the number of blank lines. While some are nice for readability (to separate blocks of code) or are recommended by PEP8, blank lines between before elif or else block just break up the reading and reduce readability.

Code Snippets

class CSVWriter(StreamListener):
    def __init__(self, file_name, *args, **kwargs):
        self.file_name = file_name
        StreamListener.__init__(*args, **kwargs)

    def on_data(self, data):
        try:
            tweet = parse_tweet(data)
            content = extract_content(tweet)
            with open(self.file_name, 'a') as f:
                writer = csv.writer(f, quotechar = '"')
                writer.writerow(content)
        except Exception as e:
            logger.warning(e)
        return True
twitter_stream = Stream(auth, CSVWriter("tweets.py"))

Context

StackExchange Code Review Q#145314, answer score: 2

Revisions (0)

No revisions yet.