patternpythonMinor
Vectorized numpy version of arange with multiple start stop
Viewed 0 times
arangevectorizednumpyversionwithstopmultiplestart
Problem
Given a single start and stop,
If you are curious why I need this, it's for building a 1-to-many mapping of intervals to contained positions.
numpy.arange is a good solution for building a NumPy array of evenly spaced values. However, given an array of start and an array of stop values, I would like to build an array of concatenated evenly spaced values, and do so in C speed (no looping). Here is my current solution, though I am wondering if there is a NumPy/SciPy function I missed that already does this.def vrange(starts, lengths):
""" Create concatenated ranges of integers for multiple start/length
Args:
starts (numpy.array): starts for each range
lengths (numpy.array): lengths for each range (same length as starts)
Returns:
numpy.array: concatenated ranges
See the following illustrative example:
starts = np.array([1, 3, 4, 6])
lengths = np.array([0, 2, 3, 0])
print vrange(starts, lengths)
>>> [3 4 4 5 6]
"""
# Repeat start position index length times and concatenate
cat_start = np.repeat(starts, lengths)
# Create group counter that resets for each start/length
cat_counter = np.arange(lengths.sum()) - np.repeat(lengths.cumsum() - lengths, lengths)
# Add group counter to group specific starts
cat_range = cat_start + cat_counter
return cat_rangeIf you are curious why I need this, it's for building a 1-to-many mapping of intervals to contained positions.
Solution
This code looks good to me: the docstring is clear and the implementation is simple and efficient. So I have only a few minor points.
-
The code doesn't fit in 80 columns, meaning that we have to scroll it horizontally to read it here on Code Review.
-
The docstring contains an example. If it were formatted like this:
then it could be run using the
-
The example in the docstring uses the
-
I think it would be clearer to take a
-
The docstring says that
-
It would be possible to allow
-
The implementation requires
-
The implementation carries out a sum that includes (among other terms):
this is the same as:
which saves a call to
Putting all that together, I get:
This is not quite as clear as your implementation: it's hard to give a concise explanation of what
-
The code doesn't fit in 80 columns, meaning that we have to scroll it horizontally to read it here on Code Review.
-
The docstring contains an example. If it were formatted like this:
>>> starts = np.array([1, 3, 4, 6])
>>> lengths = np.array([0, 2, 3, 0])
>>> vrange(starts, lengths)
array([3, 4, 4, 5, 6])then it could be run using the
doctest module, allowing you to automatically check that it is correct.-
The example in the docstring uses the
print statement and so is not compatible with Python 3.-
I think it would be clearer to take a
stops array instead of a lengths array. It would then have an interface that corresponds closely to range and numpy.arange. Possibly lengths is more convenient for your application, but you can easily calculate stops = starts + lengths.-
The docstring says that
starts and lengths must be "numpy.array", by which I think you mean numpy.ndarray, but in fact it's OK for starts to be an array_like because the code doesn't call any methods on it.-
It would be possible to allow
lengths to be array_like too, by calling numpy.asarray.-
The implementation requires
starts and lengths to be 1-dimensional, so the docstring should mention this.-
The implementation carries out a sum that includes (among other terms):
np.repeat(starts, lengths) - np.repeat(lengths.cumsum() - lengths, lengths)this is the same as:
np.repeat(starts - lengths.cumsum() + lengths, lengths)which saves a call to
numpy.repeat, and this is the same as:np.repeat(stops - lengths.cumsum(), lengths)Putting all that together, I get:
def vrange(starts, stops):
"""Create concatenated ranges of integers for multiple start/stop
Parameters:
starts (1-D array_like): starts for each range
stops (1-D array_like): stops for each range (same shape as starts)
Returns:
numpy.ndarray: concatenated ranges
For example:
>>> starts = [1, 3, 4, 6]
>>> stops = [1, 5, 7, 6]
>>> vrange(starts, stops)
array([3, 4, 4, 5, 6])
"""
stops = np.asarray(stops)
l = stops - starts # Lengths of each range.
return np.repeat(stops - l.cumsum(), l) + np.arange(l.sum())This is not quite as clear as your implementation: it's hard to give a concise explanation of what
stops - l.cumsum() means. So I can see an argument for preferring the more explanatory version even if it does have an extra call to numpy.repeat.Code Snippets
>>> starts = np.array([1, 3, 4, 6])
>>> lengths = np.array([0, 2, 3, 0])
>>> vrange(starts, lengths)
array([3, 4, 4, 5, 6])np.repeat(starts, lengths) - np.repeat(lengths.cumsum() - lengths, lengths)np.repeat(starts - lengths.cumsum() + lengths, lengths)np.repeat(stops - lengths.cumsum(), lengths)def vrange(starts, stops):
"""Create concatenated ranges of integers for multiple start/stop
Parameters:
starts (1-D array_like): starts for each range
stops (1-D array_like): stops for each range (same shape as starts)
Returns:
numpy.ndarray: concatenated ranges
For example:
>>> starts = [1, 3, 4, 6]
>>> stops = [1, 5, 7, 6]
>>> vrange(starts, stops)
array([3, 4, 4, 5, 6])
"""
stops = np.asarray(stops)
l = stops - starts # Lengths of each range.
return np.repeat(stops - l.cumsum(), l) + np.arange(l.sum())Context
StackExchange Code Review Q#83018, answer score: 5
Revisions (0)
No revisions yet.