snippetpythonMinor
Connect to streaming Twitter API, parse tweets, and write to CSV in real-time
Viewed 0 times
realconnectcsvwriteparsetimestreamingandtwitterapi
Problem
I have been using Tweeepy to connect to the Twitter Streaming API to collect tweets, parse that data, and then write select fields to a CSV file. Based on some examples that I found, I put together the following code to manage that connection. One thing that I had to work around was how to handle the connection getting killed. I was able to include an while-loop that properly handles exceptions and restarts the connection if needed.
I want to make sure that this code is optimized and that I'm not including things that might not be needed.
```
#!/usr/bin/env python
import logging
import time
import csv
import json
import tweepy
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy.streaming import StreamListener
from datetime import datetime
from dateutil import parser
# enable logging
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s %(module)s - %(funcName)s: %(message)s',
datefmt="%Y-%m-%d %H:%M:%S")
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# authorize the app to access Twitter on our behalf
consumer_key = ' '
consumer_secret = ' '
access_token = ' '
access_secret = ' '
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
# establish open connection to streaming API
class MyListener(StreamListener):
def on_data(self, data):
try:
tweet = parse_tweet(data)
content = extract_content(tweet)
with open('tweets.csv', 'a') as f:
writer = csv.writer(f, quotechar = '"')
writer.writerow(content)
#logger.info(content[3])
except BaseException as e:
logger.warning(e)
return True
def on_error(self, status):
logger.warning(status)
return True
# parse data
def parse_tweet(data):
# load JSON item into a dict
tweet = json.loads(data)
# check i
I want to make sure that this code is optimized and that I'm not including things that might not be needed.
```
#!/usr/bin/env python
import logging
import time
import csv
import json
import tweepy
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy.streaming import StreamListener
from datetime import datetime
from dateutil import parser
# enable logging
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s %(module)s - %(funcName)s: %(message)s',
datefmt="%Y-%m-%d %H:%M:%S")
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# authorize the app to access Twitter on our behalf
consumer_key = ' '
consumer_secret = ' '
access_token = ' '
access_secret = ' '
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
# establish open connection to streaming API
class MyListener(StreamListener):
def on_data(self, data):
try:
tweet = parse_tweet(data)
content = extract_content(tweet)
with open('tweets.csv', 'a') as f:
writer = csv.writer(f, quotechar = '"')
writer.writerow(content)
#logger.info(content[3])
except BaseException as e:
logger.warning(e)
return True
def on_error(self, status):
logger.warning(status)
return True
# parse data
def parse_tweet(data):
# load JSON item into a dict
tweet = json.loads(data)
# check i
Solution
MyListener is not a very descriptive name. Maybe CSVWriter would be a better one. And that name already suggests something else, that it should take the file name as a parameter:class CSVWriter(StreamListener):
def __init__(self, file_name, *args, **kwargs):
self.file_name = file_name
StreamListener.__init__(*args, **kwargs)
def on_data(self, data):
try:
tweet = parse_tweet(data)
content = extract_content(tweet)
with open(self.file_name, 'a') as f:
writer = csv.writer(f, quotechar = '"')
writer.writerow(content)
except Exception as e:
logger.warning(e)
return TrueIt is also enough to except
Exception. BaseException will catch a bit more than you would probably want, namely KeyboardInterrupt (so you can press Ctrl-C to stop the program)....
twitter_stream = Stream(auth, CSVWriter("tweets.py"))There seems to be no need to first save
tweet['CREATED_AT'] = parser.parse(tweet['created_at']) and then later do tweet['CREATED_AT'].strftime('%Y-%m-%d %H:%M:%S'). Just do the correct conversion at the end (I don't know which one exactly, because I don't know what format it is in, orginally).I would also reduce the number of blank lines. While some are nice for readability (to separate blocks of code) or are recommended by PEP8, blank lines between before
elif or else block just break up the reading and reduce readability.Code Snippets
class CSVWriter(StreamListener):
def __init__(self, file_name, *args, **kwargs):
self.file_name = file_name
StreamListener.__init__(*args, **kwargs)
def on_data(self, data):
try:
tweet = parse_tweet(data)
content = extract_content(tweet)
with open(self.file_name, 'a') as f:
writer = csv.writer(f, quotechar = '"')
writer.writerow(content)
except Exception as e:
logger.warning(e)
return Truetwitter_stream = Stream(auth, CSVWriter("tweets.py"))Context
StackExchange Code Review Q#145314, answer score: 2
Revisions (0)
No revisions yet.