patternpythonMinor
Returning a NumPy array (or list) of strings of words repeated N times
Viewed 0 times
numpyrepeatedarraywordsreturningtimesliststrings
Problem
I have a list of \$n\$ words, and a corresponding \$m \space x \space n\$ frequency matrix (as a NumPy array). I would like to return a list/array of strings of length \$m\$ where the \$m\$th string is comprised of each word repeated according to the frequencies in the \$m\$th row of the frequency matrix. I have managed to achieve the desired result (help from here), but the code is not particularly easy to understand at a glance. Is there a cleaner and more efficient way to perform the following operation?
I am looking for solutions compatible with Python 3.5.
import numpy as np
x = ['yugoslavia', 'zealand', 'zimbabwe', 'zip', 'zone']
y = np.array([[2,1,0,0,5], [0,0,1,3,0]])
z = np.apply_along_axis(lambda b: ' '.join([ item for sublist in [[x[i]]*b[i] for i in range(len(x))] for item in sublist]),1,y)
>>> z
array(['yugoslavia yugoslavia zealand zone zone zone zone zone',
'zimbabwe zip zip zip'],
dtype='<U54')I am looking for solutions compatible with Python 3.5.
Solution
It would seem you're doing it the right way. One thing though: you might want to replace the following piece of code:
A few points as to how you could improve this:
Finally, formatting can make loads of difference:
An equally ugly alternative might involve using two join statements:
Note how I've had to use
[[x[i]]*b[i] for i in range(len(x))]A few points as to how you could improve this:
- I suggest you use
zipto iterate over two arrays simultaneously.
- Also, prefer using
()over[], since it creates a generator expression, rather than a list.
- A similar argument holds with the construct
join([ ... ]). Simply usejoin( ... )instead, which would avoid creating the list in memory.
- Better variable names will also help with clarity.
([s] * count for s, count in zip(strings, counts))Finally, formatting can make loads of difference:
import numpy as np
strings = ['yugoslavia', 'zealand', 'zimbabwe', 'zip', 'zone']
counts_array = np.array([[2,1,0,0,5], [0,0,1,3,0]])
result = np.apply_along_axis(
lambda counts: ' '.join(item for sublist in
([s] * count for s, count in zip(strings, counts))
for item in sublist),
1, counts_array)An equally ugly alternative might involve using two join statements:
result = np.apply_along_axis(
lambda counts: ' '.join(filter(None,
(' '.join([s] * count) for (s, count) in zip(strings, counts)))),
1, counts_array)Note how I've had to use
filter, as per this question, in order to remove the extra spaces emanating from the empty strings.Code Snippets
[[x[i]]*b[i] for i in range(len(x))]([s] * count for s, count in zip(strings, counts))import numpy as np
strings = ['yugoslavia', 'zealand', 'zimbabwe', 'zip', 'zone']
counts_array = np.array([[2,1,0,0,5], [0,0,1,3,0]])
result = np.apply_along_axis(
lambda counts: ' '.join(item for sublist in
([s] * count for s, count in zip(strings, counts))
for item in sublist),
1, counts_array)result = np.apply_along_axis(
lambda counts: ' '.join(filter(None,
(' '.join([s] * count) for (s, count) in zip(strings, counts)))),
1, counts_array)Context
StackExchange Code Review Q#138069, answer score: 2
Revisions (0)
No revisions yet.