HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Analyze frequency and content of political fundraising E-mails

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
fundraisingmailsanalyzefrequencyandcontentpolitical

Problem

Since I'm a big politics nerd, I wanted to write a little script that would analyze the frequency and content of political fundraising emails. I signed up for the e-mails of 6 campaigns, donated a dollar to each so I'd get hit up for more money often, and let it sit. Unfortunately you won't be able to run the script because it uses some of my personal login details but you could fill your own in if you were really so inclined to give it a run! I was just curious to see if you wonderful people could see anything that could use improvement!

```
import gmail,pandas as pd,numpy as np,matplotlib.pyplot as plt, plotly.plotly as py, plotly.graph_objs as go,json
from tqdm import tqdm
from collections import Counter
from bs4 import BeautifulSoup
from textblob import TextBlob
from textblob.en.sentiments import NaiveBayesAnalyzer
import sys,indicoio
indicoio.config.api_key = '#'
reload(sys)
sys.setdefaultencoding('utf8')

r=open('emails.csv','w')

py.sign_in('#','#')

cruzbin=[];trumpbin=[];clintbin=[];rubiobin=[];christiebin=[];jebbin=[]
politicians = ['tedcruz.org','donaldjtrump.com','donaldtrump.com','hillaryclinton.com','marcorubio.com','jeb2016.com','chrischristie.com']

def start():
g = gmail.login('#','#')
return g

def sorter(g):
for _ in tqdm(g.inbox().mail(sender=politicians[0],prefetch=True)):
cruzbin.append(_)
for _ in g.inbox().mail(sender=politicians[1],prefetch=True):
trumpbin.append(_)
for _ in g.inbox().mail(sender=politicians[2],prefetch=True):
trumpbin.append(_)
for _ in g.inbox().mail(sender=politicians[3],prefetch=True):
clintbin.append(_)
for _ in g.inbox().mail(sender=politicians[4],prefetch=True):
rubiobin.append(_)
for _ in g.inbox().mail(sender=politicians[5],prefetch=True):
jebbin.append(_)
for _ in g.inbox().mail(sender=politicians[6],prefetch=True):
christiebin.append(_)
bins = [cruzbin,trumpbin,clintbin,rubiobin,jebbin,christiebin]
return bins

Solution

Let's start with the obvious: this code doesn't run. You're missing ans = starter() so that further (el)if ans.lower() == ... doesn't miserably fail with a NameError.

Likely, you define q() but never use it.

And you also appears to have other useless stuff floating around: why use both textblob and indicoio to perform sentiment analysis? You also seem to never use the blob produced by the textblob analyzer. Same for overallpos or overallneg: no content whatsoever added to them. And a lot of comments that are just old test code being removed…
Improve data structure

Having several variables to hold data for a single logical entity is a mess. Especially if you have several of those entities. The first step is to make a class out of this logical entity so a single variable hold everything you would want to know about it. Second, use a list of these entities and iterate over this list instead of manually writing each element each time: you will avoid inconsistencies like calling tqdm only for retrieving the mails associated to tedcruz.org.

In python, if you only want to store attributes and not build a full blown class, you can use a namedtuple:

from collections import namedtuple
Politician = namedtuple('Politician', 'name emails bin')

politics = [
    Politician('Ted Cruz', ['tedcruz.org'], []),
    Politician('Donald Trump', ['donaldtrump.com', 'donaldjtrump.com'], []),
    Politician('Hillary Clinton', ['hillaryclinton.com'], []),
    Politician('Marco Rubio', ['marcorubio.com'], []),
    Politician('Chris Christie', ['chrischristie.com'], []),
    Politician('Jeb Bush', ['jeb2016.com'], []),
]


You will most likely avoid ordering them differently in various parts of your code, leading to confusions.
Improve processing

Having your politicians in a list will let you focus on the task you want to perform on each one of them, instead of repeatedly copy/pasting your code and possibly introducing bugs.

Each time you would have 6 cases, one for each candidate, use a for loop:

def sorter(g):
    for politician in politics:
        for address in politician.emails:
            for mail in tqdm(g.inbox().mail(sender=address, prefetch=True)):
                politician.bin.append(mail)


Even better, use a list-comprehension here:

def sorter(g, politics):
    for politician in politics:
         politician.bin[:] = [mail
             for address in politician.emails
             for mail in g.inbox().mail(sender=address, prefetch=True)
         ]


Same for printing:

def counter(politics):
    for politician in politics:
        print 'Emails from {}:'.format(politician.name), len(politician.bin)


And for statistics:

def sentiments(politics):
    for politician in politics:
        bayes(politician.bin, politician.name)


It's even more for the analyzer as you heavily rely on knowing you always have 6 politicians. Using namedtuples, you can turn them back into regular tuples and use regular sequence manipulation on them:

def analyzer(politics):
    names, emails, bins = zip(*politics)
    trace0 = go.Bar(
        x=names,
        y=[len(bin) for bin in bins],
        marker=dict(color=['rgb(204,204,204)'] * len(politics)),
    )
    layout = go.Layout(
        title='Frequency of Fundraising E-Mails',
    )
    fig = go.Figure(data=[trace0], layout=layout)
    plot_url = py.plot(fig, filename='emailfreq')


For this one, please, avoid the bare return at the end, it's just noise. And since you are not using any pandas feature, why bother converting your bins into dataframes at all?

Did you notice how I passed politics as parameter to all these function calls? You should avoid relying on global variables and use parameters instead: it lets you reuse and test parts of your code more easily.
Improve file handling

You open a file at the beginning of your program without:

  • closing it;



  • knowing if you will need to write into it.



Since you do need it only for sentiments, you should handle it there. The proper way to do that in Python is using the with statement so that you file gets closed anyway at the end of the statement. Whether everything went right or wrong:

def sentiments(politics, filename):
    with open(filename, 'w') as output:
        for politician in politics:
            bayes(politician.bin, politician.name, output)


You will need to modify bayes as well to accept output as the third parameter and not rely on the r global variable.
Improve the global flow

You can't do sentiments analysis with empty bins. You can't plot either. So you should really perform counter(sorter(s)) anyway and then ask the user.

Also take a habit of wrapping your top-level code into if __name__ == '__main__': it's cleaner and you avoid running code when importing your module for testing:

```
if __name__ == '__main__':
indicoio.config.api_key = '#'
py.sign_in('#','#')

logo()
s = start()
Politician =

Code Snippets

from collections import namedtuple
Politician = namedtuple('Politician', 'name emails bin')

politics = [
    Politician('Ted Cruz', ['tedcruz.org'], []),
    Politician('Donald Trump', ['donaldtrump.com', 'donaldjtrump.com'], []),
    Politician('Hillary Clinton', ['hillaryclinton.com'], []),
    Politician('Marco Rubio', ['marcorubio.com'], []),
    Politician('Chris Christie', ['chrischristie.com'], []),
    Politician('Jeb Bush', ['jeb2016.com'], []),
]
def sorter(g):
    for politician in politics:
        for address in politician.emails:
            for mail in tqdm(g.inbox().mail(sender=address, prefetch=True)):
                politician.bin.append(mail)
def sorter(g, politics):
    for politician in politics:
         politician.bin[:] = [mail
             for address in politician.emails
             for mail in g.inbox().mail(sender=address, prefetch=True)
         ]
def counter(politics):
    for politician in politics:
        print 'Emails from {}:'.format(politician.name), len(politician.bin)
def sentiments(politics):
    for politician in politics:
        bayes(politician.bin, politician.name)

Context

StackExchange Code Review Q#118162, answer score: 12

Revisions (0)

No revisions yet.