patternpythonMinor
Curried function
Viewed 0 times
curriedfunctionstackoverflow
Problem
From the question
and since I'm currently learning functional programming I was inspired to write the following (curried) function:
Summary of what the code does:
For a given mapping of patterns to ints, it maps a string to (positive) ints if it starts with one of the patterns and it maps to 0 if no pattern is found (in fact one could and should pass startswith also as a function parameter).
So currently, if you have a
However, when acting in the Pandas/NumPy world, this approach will typically be too slow. Are there any suggestions on how to speed up this function using NumPy or Pandas vectorization capabilities?
and since I'm currently learning functional programming I was inspired to write the following (curried) function:
def map_starts_with(pat_map):
def map_string(t):
pats = [pat for pat in pat_map.keys() if t.startswith(pat)]
return pat_map.get(pats[0]) if len(pats) > 0 else 0
# get only value of "first" pattern if at least one pattern is found
return map_stringSummary of what the code does:
For a given mapping of patterns to ints, it maps a string to (positive) ints if it starts with one of the patterns and it maps to 0 if no pattern is found (in fact one could and should pass startswith also as a function parameter).
So currently, if you have a
DataFrame, you would use the apply method:df = pd.DataFrame({'col':[ 'xx', 'aaaaaa', 'c']})
col
0 xx
1 aaaaaa
2 c
mapping = { 'aaa':4 ,'c':3}
df.col.apply(lambda x: map_starts_with(mapping)(x))
0 0
1 4
2 3However, when acting in the Pandas/NumPy world, this approach will typically be too slow. Are there any suggestions on how to speed up this function using NumPy or Pandas vectorization capabilities?
Solution
I'm not completely sure about the specifics of optimizing from the
Changes applied:
You can also refactor it to use
numpy or pandas side (here are some things documented), but, we can optimize the function itself:- you are calling
.keys()making an extra list of keys in memory (in Python 2; in Python 3 thoughkeys()returns a "view")
- you can use an iterative approach switching to
iteritems()(oritems()in Python 3) and usingnext()built-in function allowing you to take an early exit once you encounter a key thattstarts with
Changes applied:
def map_starts_with(pat_map):
def map_string(t):
return next((value for key, value in pat_map.items() if t.startswith(key)), 0)
return map_stringYou can also refactor it to use
functools.partial instead of nested functions:from functools import partial
import pandas as pd
def map_starts_with(pat_map, t):
return next((value for key, value in pat_map.items() if t.startswith(key)), 0)
mapping = {'aaa': 4 ,'c': 3}
df = pd.DataFrame({'col':[ 'xx', 'aaaaaa', 'c']})
print(df.col.apply(partial(map_starts_with, mapping)))Code Snippets
def map_starts_with(pat_map):
def map_string(t):
return next((value for key, value in pat_map.items() if t.startswith(key)), 0)
return map_stringfrom functools import partial
import pandas as pd
def map_starts_with(pat_map, t):
return next((value for key, value in pat_map.items() if t.startswith(key)), 0)
mapping = {'aaa': 4 ,'c': 3}
df = pd.DataFrame({'col':[ 'xx', 'aaaaaa', 'c']})
print(df.col.apply(partial(map_starts_with, mapping)))Context
StackExchange Code Review Q#155405, answer score: 5
Revisions (0)
No revisions yet.