HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Labeling modified words

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
modifiedwordslabeling

Problem

I'm working on a Python function that takes a piece of text and indicates if words are modified by words like very or not.

I'm looking for instance to label very nice differently from not nice. This does what it's supposed to do but I feel it might be clunky or there might be a more very efficient approach I'm not aware of.

def polarizer(input_sentence, modifier_dict, window_length):
    """
    The polarizer attaches the label given in modifier_dict to all items in input_sentence occurring a window_length after the modifier.
    modifier_dict needs to be formatted like so: {'not':'-', 'yes':'+', ..}
    Returns list with labels added.
    """
    #find positions of modifiers
    indices= [(range(n+1, n+1+window_length), modifier_dict[i]) for n,i in enumerate(input_sentence) if i in modifier_dict.keys()]
    #iterate over modifiers
    for ind in indices:
        for n in ind[0]:
            #add label unless there is already a label attached
            if n < len(input_sentence) and not input_sentence[n].endswith(tuple(modifier_dict.values())):
                input_sentence[n]=input_sentence[n]+ind[1]
    outputsentence= input_sentence
    return outputsentence

#Test sentences
modifier_dict={'very':'+'}

test1="This is fine".split(" ")
test2="This is very fine".split(" ")

print polarizer(test1, modifier_dict, 2)
print polarizer(test2, modifier_dict, 2)

Solution

-
When there're comments in the code that tell what it does, it usually indicates that the following piece of code should be a separate function with a meaningful name. I'd create a separate function that the positions of the words to be modified (and it something like get_indices_to_modify) and a separate function that checks if a word already has a label.

-
You can make your code more readable the tuples from the indices list:

for modification_range, label in indices looks better than

for ind in indices: # Do something with ind[0] and ind[1]

-
You can also use the += operator in this line: input_sentence[n] += ind[1]

-
The names n and i are kind of confusing in your code. i normally stands for a counter while n stands for the number of elements. For instance, I'd change n and i in the first line of your polarize function to idx and word (when i is not the counter, the thing gets pretty confusing).

-
Assuming that it's python 2, you can save some space by using xrange instead of range. It can a have a serious impact if the window_length gets really big.

-
What is the point of creating the outputsentence variable just to return it? Why not return the input_sentence directly?

-
The polarize function actually changes the input_sentence. Your documentation doesn't say anything about it. This could lead to unexpected results for the users of your code. You should either make a copy and not change the input, or document the fact the input is changed explicitly.

-
Instead of printing the return value of the function to the standard output (and checking that it's correct by looking at it), I'd recommend to write unit-tests. This way, you would be able to check that code remains correct after your change it automatically (moreover, it's quite easy to overlook something when you just look at the output with your eyes).

Context

StackExchange Code Review Q#157574, answer score: 5

Revisions (0)

No revisions yet.