HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Curried function

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
curriedfunctionstackoverflow

Problem

From the question
and since I'm currently learning functional programming I was inspired to write the following (curried) function:

def map_starts_with(pat_map):
    def map_string(t):
        pats = [pat for pat in pat_map.keys() if t.startswith(pat)]
        return pat_map.get(pats[0]) if len(pats) > 0 else 0 
    # get only value of "first" pattern if at least one pattern is found
    return map_string


Summary of what the code does:


For a given mapping of patterns to ints, it maps a string to (positive) ints if it starts with one of the patterns and it maps to 0 if no pattern is found (in fact one could and should pass startswith also as a function parameter).

So currently, if you have a DataFrame, you would use the apply method:

df = pd.DataFrame({'col':[ 'xx', 'aaaaaa', 'c']})
      col
0      xx
1  aaaaaa
2       c

mapping = { 'aaa':4 ,'c':3}
df.col.apply(lambda x: map_starts_with(mapping)(x))

0    0
1    4
2    3


However, when acting in the Pandas/NumPy world, this approach will typically be too slow. Are there any suggestions on how to speed up this function using NumPy or Pandas vectorization capabilities?

Solution

I'm not completely sure about the specifics of optimizing from the numpy or pandas side (here are some things documented), but, we can optimize the function itself:

  • you are calling .keys() making an extra list of keys in memory (in Python 2; in Python 3 though keys() returns a "view")



  • you can use an iterative approach switching to iteritems() (or items() in Python 3) and using next() built-in function allowing you to take an early exit once you encounter a key that t starts with



Changes applied:

def map_starts_with(pat_map):
    def map_string(t):
        return next((value for key, value in pat_map.items() if t.startswith(key)), 0)
    return map_string


You can also refactor it to use functools.partial instead of nested functions:

from functools import partial

import pandas as pd

def map_starts_with(pat_map, t):
    return next((value for key, value in pat_map.items() if t.startswith(key)), 0)

mapping = {'aaa': 4 ,'c': 3}

df = pd.DataFrame({'col':[ 'xx', 'aaaaaa', 'c']})
print(df.col.apply(partial(map_starts_with, mapping)))

Code Snippets

def map_starts_with(pat_map):
    def map_string(t):
        return next((value for key, value in pat_map.items() if t.startswith(key)), 0)
    return map_string
from functools import partial

import pandas as pd


def map_starts_with(pat_map, t):
    return next((value for key, value in pat_map.items() if t.startswith(key)), 0)

mapping = {'aaa': 4 ,'c': 3}

df = pd.DataFrame({'col':[ 'xx', 'aaaaaa', 'c']})
print(df.col.apply(partial(map_starts_with, mapping)))

Context

StackExchange Code Review Q#155405, answer score: 5

Revisions (0)

No revisions yet.