patternpythonModerate
Analyze frequency and content of political fundraising E-mails
Viewed 0 times
fundraisingmailsanalyzefrequencyandcontentpolitical
Problem
Since I'm a big politics nerd, I wanted to write a little script that would analyze the frequency and content of political fundraising emails. I signed up for the e-mails of 6 campaigns, donated a dollar to each so I'd get hit up for more money often, and let it sit. Unfortunately you won't be able to run the script because it uses some of my personal login details but you could fill your own in if you were really so inclined to give it a run! I was just curious to see if you wonderful people could see anything that could use improvement!
```
import gmail,pandas as pd,numpy as np,matplotlib.pyplot as plt, plotly.plotly as py, plotly.graph_objs as go,json
from tqdm import tqdm
from collections import Counter
from bs4 import BeautifulSoup
from textblob import TextBlob
from textblob.en.sentiments import NaiveBayesAnalyzer
import sys,indicoio
indicoio.config.api_key = '#'
reload(sys)
sys.setdefaultencoding('utf8')
r=open('emails.csv','w')
py.sign_in('#','#')
cruzbin=[];trumpbin=[];clintbin=[];rubiobin=[];christiebin=[];jebbin=[]
politicians = ['tedcruz.org','donaldjtrump.com','donaldtrump.com','hillaryclinton.com','marcorubio.com','jeb2016.com','chrischristie.com']
def start():
g = gmail.login('#','#')
return g
def sorter(g):
for _ in tqdm(g.inbox().mail(sender=politicians[0],prefetch=True)):
cruzbin.append(_)
for _ in g.inbox().mail(sender=politicians[1],prefetch=True):
trumpbin.append(_)
for _ in g.inbox().mail(sender=politicians[2],prefetch=True):
trumpbin.append(_)
for _ in g.inbox().mail(sender=politicians[3],prefetch=True):
clintbin.append(_)
for _ in g.inbox().mail(sender=politicians[4],prefetch=True):
rubiobin.append(_)
for _ in g.inbox().mail(sender=politicians[5],prefetch=True):
jebbin.append(_)
for _ in g.inbox().mail(sender=politicians[6],prefetch=True):
christiebin.append(_)
bins = [cruzbin,trumpbin,clintbin,rubiobin,jebbin,christiebin]
return bins
```
import gmail,pandas as pd,numpy as np,matplotlib.pyplot as plt, plotly.plotly as py, plotly.graph_objs as go,json
from tqdm import tqdm
from collections import Counter
from bs4 import BeautifulSoup
from textblob import TextBlob
from textblob.en.sentiments import NaiveBayesAnalyzer
import sys,indicoio
indicoio.config.api_key = '#'
reload(sys)
sys.setdefaultencoding('utf8')
r=open('emails.csv','w')
py.sign_in('#','#')
cruzbin=[];trumpbin=[];clintbin=[];rubiobin=[];christiebin=[];jebbin=[]
politicians = ['tedcruz.org','donaldjtrump.com','donaldtrump.com','hillaryclinton.com','marcorubio.com','jeb2016.com','chrischristie.com']
def start():
g = gmail.login('#','#')
return g
def sorter(g):
for _ in tqdm(g.inbox().mail(sender=politicians[0],prefetch=True)):
cruzbin.append(_)
for _ in g.inbox().mail(sender=politicians[1],prefetch=True):
trumpbin.append(_)
for _ in g.inbox().mail(sender=politicians[2],prefetch=True):
trumpbin.append(_)
for _ in g.inbox().mail(sender=politicians[3],prefetch=True):
clintbin.append(_)
for _ in g.inbox().mail(sender=politicians[4],prefetch=True):
rubiobin.append(_)
for _ in g.inbox().mail(sender=politicians[5],prefetch=True):
jebbin.append(_)
for _ in g.inbox().mail(sender=politicians[6],prefetch=True):
christiebin.append(_)
bins = [cruzbin,trumpbin,clintbin,rubiobin,jebbin,christiebin]
return bins
Solution
Let's start with the obvious: this code doesn't run. You're missing
Likely, you define
And you also appears to have other useless stuff floating around: why use both
Improve data structure
Having several variables to hold data for a single logical entity is a mess. Especially if you have several of those entities. The first step is to make a class out of this logical entity so a single variable hold everything you would want to know about it. Second, use a list of these entities and iterate over this list instead of manually writing each element each time: you will avoid inconsistencies like calling
In python, if you only want to store attributes and not build a full blown class, you can use a
You will most likely avoid ordering them differently in various parts of your code, leading to confusions.
Improve processing
Having your politicians in a list will let you focus on the task you want to perform on each one of them, instead of repeatedly copy/pasting your code and possibly introducing bugs.
Each time you would have 6 cases, one for each candidate, use a for loop:
Even better, use a list-comprehension here:
Same for printing:
And for statistics:
It's even more for the analyzer as you heavily rely on knowing you always have 6 politicians. Using
For this one, please, avoid the bare
Did you notice how I passed
Improve file handling
You open a file at the beginning of your program without:
Since you do need it only for
You will need to modify
Improve the global flow
You can't do sentiments analysis with empty bins. You can't plot either. So you should really perform
Also take a habit of wrapping your top-level code into
```
if __name__ == '__main__':
indicoio.config.api_key = '#'
py.sign_in('#','#')
logo()
s = start()
Politician =
ans = starter() so that further (el)if ans.lower() == ... doesn't miserably fail with a NameError.Likely, you define
q() but never use it.And you also appears to have other useless stuff floating around: why use both
textblob and indicoio to perform sentiment analysis? You also seem to never use the blob produced by the textblob analyzer. Same for overallpos or overallneg: no content whatsoever added to them. And a lot of comments that are just old test code being removed…Improve data structure
Having several variables to hold data for a single logical entity is a mess. Especially if you have several of those entities. The first step is to make a class out of this logical entity so a single variable hold everything you would want to know about it. Second, use a list of these entities and iterate over this list instead of manually writing each element each time: you will avoid inconsistencies like calling
tqdm only for retrieving the mails associated to tedcruz.org.In python, if you only want to store attributes and not build a full blown class, you can use a
namedtuple:from collections import namedtuple
Politician = namedtuple('Politician', 'name emails bin')
politics = [
Politician('Ted Cruz', ['tedcruz.org'], []),
Politician('Donald Trump', ['donaldtrump.com', 'donaldjtrump.com'], []),
Politician('Hillary Clinton', ['hillaryclinton.com'], []),
Politician('Marco Rubio', ['marcorubio.com'], []),
Politician('Chris Christie', ['chrischristie.com'], []),
Politician('Jeb Bush', ['jeb2016.com'], []),
]You will most likely avoid ordering them differently in various parts of your code, leading to confusions.
Improve processing
Having your politicians in a list will let you focus on the task you want to perform on each one of them, instead of repeatedly copy/pasting your code and possibly introducing bugs.
Each time you would have 6 cases, one for each candidate, use a for loop:
def sorter(g):
for politician in politics:
for address in politician.emails:
for mail in tqdm(g.inbox().mail(sender=address, prefetch=True)):
politician.bin.append(mail)Even better, use a list-comprehension here:
def sorter(g, politics):
for politician in politics:
politician.bin[:] = [mail
for address in politician.emails
for mail in g.inbox().mail(sender=address, prefetch=True)
]Same for printing:
def counter(politics):
for politician in politics:
print 'Emails from {}:'.format(politician.name), len(politician.bin)And for statistics:
def sentiments(politics):
for politician in politics:
bayes(politician.bin, politician.name)It's even more for the analyzer as you heavily rely on knowing you always have 6 politicians. Using
namedtuples, you can turn them back into regular tuples and use regular sequence manipulation on them:def analyzer(politics):
names, emails, bins = zip(*politics)
trace0 = go.Bar(
x=names,
y=[len(bin) for bin in bins],
marker=dict(color=['rgb(204,204,204)'] * len(politics)),
)
layout = go.Layout(
title='Frequency of Fundraising E-Mails',
)
fig = go.Figure(data=[trace0], layout=layout)
plot_url = py.plot(fig, filename='emailfreq')For this one, please, avoid the bare
return at the end, it's just noise. And since you are not using any pandas feature, why bother converting your bins into dataframes at all?Did you notice how I passed
politics as parameter to all these function calls? You should avoid relying on global variables and use parameters instead: it lets you reuse and test parts of your code more easily.Improve file handling
You open a file at the beginning of your program without:
- closing it;
- knowing if you will need to write into it.
Since you do need it only for
sentiments, you should handle it there. The proper way to do that in Python is using the with statement so that you file gets closed anyway at the end of the statement. Whether everything went right or wrong:def sentiments(politics, filename):
with open(filename, 'w') as output:
for politician in politics:
bayes(politician.bin, politician.name, output)You will need to modify
bayes as well to accept output as the third parameter and not rely on the r global variable.Improve the global flow
You can't do sentiments analysis with empty bins. You can't plot either. So you should really perform
counter(sorter(s)) anyway and then ask the user.Also take a habit of wrapping your top-level code into
if __name__ == '__main__': it's cleaner and you avoid running code when importing your module for testing:```
if __name__ == '__main__':
indicoio.config.api_key = '#'
py.sign_in('#','#')
logo()
s = start()
Politician =
Code Snippets
from collections import namedtuple
Politician = namedtuple('Politician', 'name emails bin')
politics = [
Politician('Ted Cruz', ['tedcruz.org'], []),
Politician('Donald Trump', ['donaldtrump.com', 'donaldjtrump.com'], []),
Politician('Hillary Clinton', ['hillaryclinton.com'], []),
Politician('Marco Rubio', ['marcorubio.com'], []),
Politician('Chris Christie', ['chrischristie.com'], []),
Politician('Jeb Bush', ['jeb2016.com'], []),
]def sorter(g):
for politician in politics:
for address in politician.emails:
for mail in tqdm(g.inbox().mail(sender=address, prefetch=True)):
politician.bin.append(mail)def sorter(g, politics):
for politician in politics:
politician.bin[:] = [mail
for address in politician.emails
for mail in g.inbox().mail(sender=address, prefetch=True)
]def counter(politics):
for politician in politics:
print 'Emails from {}:'.format(politician.name), len(politician.bin)def sentiments(politics):
for politician in politics:
bayes(politician.bin, politician.name)Context
StackExchange Code Review Q#118162, answer score: 12
Revisions (0)
No revisions yet.