HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Assigning sentiment to each tweet - Twitter trend

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
tweeteachsentimenttrendtwitterassigning

Problem

Below assignment is taken from here.


Introduction


In this project, you will develop a geographic visualization of
twitter data across the USA. You will need to use dictionaries, lists,
and data abstraction techniques to create a modular program. Below is
phase 1 of this project.


Phase 1: The Feelings in Tweets


In this phase, you will create an abstract data type for tweets, split the text of a tweet into words, and calculate the amount of
positive or negative feeling in a tweet.


Tweets


First, you will implement an abstract data type for Tweets. The constructor make_tweet is defined at the top of trends.py.
make_tweet returns a python dictionary with the following entries:

text: a string, the text of the tweet, all in lowercase
time: a datetime object, when the tweet was posted
latitude: a floating-point number, the latitude of the tweet's location
longitude: a floating-point number, the longitude of the tweet's location



Problem 1 (1 pt). Implement the tweet_words and tweet_time selectors. Call extract_words to list the words in the
text of a tweet.


Problem 2 (1 pt). Implement the tweet_location selector, which returns a position. Positions are another abstract data type,
defined at the top of geo.py. Make sure that you understand how to
manipulate positions; they play an important role in this project.



When you complete problems 1 and 2, the doctest for make_tweet should pass.


python3 trends.py -t make_tweet



Problem 3 (1 pt). Improve the extract_words function as follows: Assume that a word is any consecutive substring of text that
consists only of ASCII letters. The string ascii_letters in the string
module contains all letters in the ASCII character set. The
extract_words function should list all such words in order and nothing
else.


When you complete this problem, the doctest for extract_words should pass.

`pyth

Solution

The global design is a bit weird from my point of view but I'll comment on the code you've written.

In extract_words:

The code is properly formatted. A few remarks anyway :

-
you don't need that many parenthesis.

-
you don't need to check character in ascii_letters as it has to be true as this point.

-
require_current_index_change looks like it should be a boolean. Just replace 1 by True, O by False and if require_current_index_change == 1: by if require_current_index_change:.

-
Instead of having require_current_index_change to know whether you can use current_index or not, you could simply set current_index to None : it is easy to check and if you use the index anyway, you'll probably get an exception.

-
You can get rid of the part comparing the index to the length and just handle it after the loop.

-
current_index is probably not the best name as it let the reader think it corresponds to the index we are iterating over (aka index). It could be a good idea to convey the idea of beginning or starting index.

At the end, the code looks like :

def extract_words(text):
    lst = []
    starting_index = 0
    for index, character in enumerate(text):
        if character not in ascii_letters:
            if starting_index is not None:
                lst.append(text[starting_index:index])
            starting_index = None
        elif starting_index is None:
            starting_index = index
    if starting_index is not None:
        lst.append(text[starting_index:])
    return lst


Another idea would be to do things differently, by replacing unwanted characters by spaces and then to split on spaces.

In make_sentiment:

Instead of asserting, it could be an idea to raise a ValueError.

In has_sentiment:

You can simply : return s is not None.

Also, you should not compare to None using == but with is as per PEP8. You'll find various tools like pep8, pyflakes, etc to check your code and detect such things.

In analyze_tweet_sentiment:

Because non-zero integers value are considered True in boolean contexts, you can write :

if total_sentiment:
    return total_sentiment / count_sentiment    
else:
    return average


Which can be written :

return total_sentiment / count_sentiment if total_sentiment else average


Also, average does not need to be defined that early, it could simply be :

return total_sentiment / count_sentiment if total_sentiment else make_sentiment(None)


Then, I am wondering if you should be checking total_sentiment or count_sentiment. This corresponds to choose whether you can have a sentiment of value 0 (for instance if you have both positive and negative words) or if it corresponds to None. This is an open question and I do not have the answer.

Finally, a slightly different way to write this function would be to abuse list comprehension in order to be able to reuse builtin functions len and sum. For instance, we'd have something like :

def analyse(tweet):
    sentiment_values = [sentiment_value(s) for s in (get_word_sentiment(w) for w in tweet_words(tweet)) if has_sentiment(s)]
    return sum(sentiment_values)/ len(sentiment_values) if sentiment_values else make_sentiment(None)

Code Snippets

def extract_words(text):
    lst = []
    starting_index = 0
    for index, character in enumerate(text):
        if character not in ascii_letters:
            if starting_index is not None:
                lst.append(text[starting_index:index])
            starting_index = None
        elif starting_index is None:
            starting_index = index
    if starting_index is not None:
        lst.append(text[starting_index:])
    return lst
if total_sentiment:
    return total_sentiment / count_sentiment    
else:
    return average
return total_sentiment / count_sentiment if total_sentiment else average
return total_sentiment / count_sentiment if total_sentiment else make_sentiment(None)
def analyse(tweet):
    sentiment_values = [sentiment_value(s) for s in (get_word_sentiment(w) for w in tweet_words(tweet)) if has_sentiment(s)]
    return sum(sentiment_values)/ len(sentiment_values) if sentiment_values else make_sentiment(None)

Context

StackExchange Code Review Q#91158, answer score: 2

Revisions (0)

No revisions yet.