HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Returning a NumPy array (or list) of strings of words repeated N times

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
numpyrepeatedarraywordsreturningtimesliststrings

Problem

I have a list of \$n\$ words, and a corresponding \$m \space x \space n\$ frequency matrix (as a NumPy array). I would like to return a list/array of strings of length \$m\$ where the \$m\$th string is comprised of each word repeated according to the frequencies in the \$m\$th row of the frequency matrix. I have managed to achieve the desired result (help from here), but the code is not particularly easy to understand at a glance. Is there a cleaner and more efficient way to perform the following operation?

import numpy as np
x = ['yugoslavia', 'zealand', 'zimbabwe', 'zip', 'zone']
y = np.array([[2,1,0,0,5], [0,0,1,3,0]])
z = np.apply_along_axis(lambda b: ' '.join([ item for sublist in [[x[i]]*b[i] for i in range(len(x))] for item in sublist]),1,y)

>>> z
array(['yugoslavia yugoslavia zealand zone zone zone zone zone',
   'zimbabwe zip zip zip'],
  dtype='<U54')


I am looking for solutions compatible with Python 3.5.

Solution

It would seem you're doing it the right way. One thing though: you might want to replace the following piece of code:

[[x[i]]*b[i] for i in range(len(x))]


A few points as to how you could improve this:

  • I suggest you use zip to iterate over two arrays simultaneously.



  • Also, prefer using () over [], since it creates a generator expression, rather than a list.



  • A similar argument holds with the construct join([ ... ]). Simply use join( ... ) instead, which would avoid creating the list in memory.



  • Better variable names will also help with clarity.



([s] * count for s, count in zip(strings, counts))


Finally, formatting can make loads of difference:

import numpy as np

strings = ['yugoslavia', 'zealand', 'zimbabwe', 'zip', 'zone']
counts_array = np.array([[2,1,0,0,5], [0,0,1,3,0]])
result = np.apply_along_axis(
    lambda counts: ' '.join(item for sublist in
                                ([s] * count for s, count in zip(strings, counts))
                            for item in sublist),
    1, counts_array)


An equally ugly alternative might involve using two join statements:

result = np.apply_along_axis(
    lambda counts: ' '.join(filter(None,
                  (' '.join([s] * count) for (s, count) in zip(strings, counts)))),
    1, counts_array)


Note how I've had to use filter, as per this question, in order to remove the extra spaces emanating from the empty strings.

Code Snippets

[[x[i]]*b[i] for i in range(len(x))]
([s] * count for s, count in zip(strings, counts))
import numpy as np

strings = ['yugoslavia', 'zealand', 'zimbabwe', 'zip', 'zone']
counts_array = np.array([[2,1,0,0,5], [0,0,1,3,0]])
result = np.apply_along_axis(
    lambda counts: ' '.join(item for sublist in
                                ([s] * count for s, count in zip(strings, counts))
                            for item in sublist),
    1, counts_array)
result = np.apply_along_axis(
    lambda counts: ' '.join(filter(None,
                  (' '.join([s] * count) for (s, count) in zip(strings, counts)))),
    1, counts_array)

Context

StackExchange Code Review Q#138069, answer score: 2

Revisions (0)

No revisions yet.