HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Vectorized numpy version of arange with multiple start stop

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
arangevectorizednumpyversionwithstopmultiplestart

Problem

Given a single start and stop, numpy.arange is a good solution for building a NumPy array of evenly spaced values. However, given an array of start and an array of stop values, I would like to build an array of concatenated evenly spaced values, and do so in C speed (no looping). Here is my current solution, though I am wondering if there is a NumPy/SciPy function I missed that already does this.

def vrange(starts, lengths):
    """ Create concatenated ranges of integers for multiple start/length

    Args:
        starts (numpy.array): starts for each range
        lengths (numpy.array): lengths for each range (same length as starts)

    Returns:
        numpy.array: concatenated ranges

    See the following illustrative example:

        starts = np.array([1, 3, 4, 6])
        lengths = np.array([0, 2, 3, 0])

        print vrange(starts, lengths)
        >>> [3 4 4 5 6]

    """

    # Repeat start position index length times and concatenate
    cat_start = np.repeat(starts, lengths)

    # Create group counter that resets for each start/length
    cat_counter = np.arange(lengths.sum()) - np.repeat(lengths.cumsum() - lengths, lengths)

    # Add group counter to group specific starts
    cat_range = cat_start + cat_counter

    return cat_range


If you are curious why I need this, it's for building a 1-to-many mapping of intervals to contained positions.

Solution

This code looks good to me: the docstring is clear and the implementation is simple and efficient. So I have only a few minor points.

-
The code doesn't fit in 80 columns, meaning that we have to scroll it horizontally to read it here on Code Review.

-
The docstring contains an example. If it were formatted like this:

>>> starts = np.array([1, 3, 4, 6])
>>> lengths = np.array([0, 2, 3, 0])
>>> vrange(starts, lengths)
array([3, 4, 4, 5, 6])


then it could be run using the doctest module, allowing you to automatically check that it is correct.

-
The example in the docstring uses the print statement and so is not compatible with Python 3.

-
I think it would be clearer to take a stops array instead of a lengths array. It would then have an interface that corresponds closely to range and numpy.arange. Possibly lengths is more convenient for your application, but you can easily calculate stops = starts + lengths.

-
The docstring says that starts and lengths must be "numpy.array", by which I think you mean numpy.ndarray, but in fact it's OK for starts to be an array_like because the code doesn't call any methods on it.

-
It would be possible to allow lengths to be array_like too, by calling numpy.asarray.

-
The implementation requires starts and lengths to be 1-dimensional, so the docstring should mention this.

-
The implementation carries out a sum that includes (among other terms):

np.repeat(starts, lengths) - np.repeat(lengths.cumsum() - lengths, lengths)


this is the same as:

np.repeat(starts - lengths.cumsum() + lengths, lengths)


which saves a call to numpy.repeat, and this is the same as:

np.repeat(stops - lengths.cumsum(), lengths)


Putting all that together, I get:

def vrange(starts, stops):
    """Create concatenated ranges of integers for multiple start/stop

    Parameters:
        starts (1-D array_like): starts for each range
        stops (1-D array_like): stops for each range (same shape as starts)

    Returns:
        numpy.ndarray: concatenated ranges

    For example:

        >>> starts = [1, 3, 4, 6]
        >>> stops  = [1, 5, 7, 6]
        >>> vrange(starts, stops)
        array([3, 4, 4, 5, 6])

    """
    stops = np.asarray(stops)
    l = stops - starts # Lengths of each range.
    return np.repeat(stops - l.cumsum(), l) + np.arange(l.sum())


This is not quite as clear as your implementation: it's hard to give a concise explanation of what stops - l.cumsum() means. So I can see an argument for preferring the more explanatory version even if it does have an extra call to numpy.repeat.

Code Snippets

>>> starts = np.array([1, 3, 4, 6])
>>> lengths = np.array([0, 2, 3, 0])
>>> vrange(starts, lengths)
array([3, 4, 4, 5, 6])
np.repeat(starts, lengths) - np.repeat(lengths.cumsum() - lengths, lengths)
np.repeat(starts - lengths.cumsum() + lengths, lengths)
np.repeat(stops - lengths.cumsum(), lengths)
def vrange(starts, stops):
    """Create concatenated ranges of integers for multiple start/stop

    Parameters:
        starts (1-D array_like): starts for each range
        stops (1-D array_like): stops for each range (same shape as starts)

    Returns:
        numpy.ndarray: concatenated ranges

    For example:

        >>> starts = [1, 3, 4, 6]
        >>> stops  = [1, 5, 7, 6]
        >>> vrange(starts, stops)
        array([3, 4, 4, 5, 6])

    """
    stops = np.asarray(stops)
    l = stops - starts # Lengths of each range.
    return np.repeat(stops - l.cumsum(), l) + np.arange(l.sum())

Context

StackExchange Code Review Q#83018, answer score: 5

Revisions (0)

No revisions yet.