patternpythonMinor

Random Word Splitter

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

randomwordsplitter

Problem

I wrote a word splitting function. It splits a word into random characters. For example if input is 'runtime' one of each below output possible:

['runtime']

['r','untime']

['r','u','n','t','i','m','e'] ....

But it's runtime is very high when I want to split 100k words do you have any suggestions to optimize or write it smarter.

def random_multisplitter(word):
    from numpy import mod
    spw = []
    length = len(word)
    rand = random_int(word)
    if rand == length:       #probability of not splitting
        return [word]

    else:
        div = mod(rand, (length + 1))#defining division points 
        bound = length - div
        spw.append(div)
        while div != 0:
            rand = random_int(word)
            div = mod(rand,(bound+1))
            bound = bound-div
            spw.append(div)
        result = spw
    b = 0
    points =[]
    for x in range(len(result)-1): #calculating splitting points 
        b=b+result[x]
        points.append(b)
    xy=0
    t=[]
    for i in points:
        t.append(word[xy:i])
        xy=i
    if word[xy:len(word)]!='':
        t.append(word[xy:len(word)])
    if type(t)!=list:
        return [t]
    return t

Solution

Most of your variable names seem good, however xy explains nothing.

You should normally have two space's around operators.
b = b + result[x] is more readable than b=b+result[x],
which looks at a glance like a variable.

Why do you need to import numpy to do a simple mod?

Python comes with mod, %.

You use the word result,
which I would use to imply that that will be the result of a function.

You shouldn't import things anywhere apart from the beginning of the file.
I nor anyone else wants to traverse you entire program to find that you import numpy's mod.
We want to know at the begging you use it.
Also this can lead to bugs where you think you have imported numpy, but you haven't.

Your programs overall ability to be read with ease is quite low.

I would recommend splitting the function into two.
A 'feeder' and a 'consumer'.

First I love generators, and you can write one that reduces the complexity of this program.

The algorithm that I use is to find the start and stops of the split.
Then I replace start with stop and add a random amount to the stop.
It really is that simple. To do the majority of your program.

def get_numbers(length):
    start, stop = 0, 0
    while stop < length:
        start, stop = stop, stop + random.randint(1, length)
        yield start, stop

This is simple, it will count until it gets to or past the length, and will return them.
You can think of it like building an array that looks like [(start, stop), (start, stop), ...].

This is a near drop in replacement for the entire program until for i in points:.
However I use a lazy approach for that for loop.
Where yours is a one dimensional list of index.

If you wanted it to build an array instead then it would look like the following block.
However it's not advised, as it will lead to bad memory usage, and it will take slightly longer.

def get_numbers(length):
    list_ = []
    start, stop = 0, 0
    while stop < length:
        start, stop = stop, stop + random_int(length)
        list_.append((start, stop))
    return list_

Then you can now split the string.
To do this I will loop through the above generator and yield split strings.

def random_multisplitter(word):
    for start, stop in get_numbers(len(word)):
        yield word[start:stop]

I use Python's amazing split operation just like you did.
However as you want random_multisplitter to return a list, not a generator.
You will change it.
But a generator is a better choice for the 100k words input.
This is as then you will have a smaller memory consumption.

def random_multisplitter(word):
    return [word[start:stop] for start, stop in get_numbers(len(word))]

If not faster, it will at least be smaller and nicer to look at.

I thought that the program would be larger, and so, a small one function version would be:

from random import randint

def random_multisplitter(word):
    length = len(word)
    start, stop = 0, 0
    while stop < length:
        start, stop = stop, stop + randint(1, length)
        yield word[start:stop]

# Generator
random_multisplitter('runtime')
# List
list(random_multisplitter('runtime'))

Some people may dislike that I allow the slice to go above the maximum length of the list. Just to avoid any potential confusion about that, it's safe to use, but can seem weird.

>>> 'abcde'[0:20]
'abcde'
>>> 'abcde'[20]
IndexError: string index out of range

Due to this you may wish to change the assignment of stop to the following:

start, stop = stop, randint(stop + 1, length)

Or when you are returing the values change it to:

yield start, min(stop, length)

The former will have a higher amount of splits to the end of the string. Where as the unmodified version and latter version will prominently have two or three main splits on small strings.

As this is tagged performance, It's probably best if we have a speed test.
The code I use to test the speed is:

word = ' ' * 100000

def time_it(fn, name):
    n = 100
    t0 = time.time()
    for _ in range(n):
        fn(word)
    t1 = time.time()
    print name, '=', t1-t0

This uses time and so just take the results with a bit of salt.

Also, I can't test your original code.
random_int is not defined in the example.

And finally I used start, stop = stop, stop + random.randint(1, 2) On the 'Slow' ones, and start, stop = stop, stop + random.randint(1, length) on the 'Fast' ones.

Other Answer = 75.2860000134
Slow Generator = 12.1919999123
Slow List = 13.236000061
Fast Generator = 0.00200009346008
Fast List = 0.0019998550415

Keep in mind that Generators perform better when they aren't converted to a list straight off the bat.

Code Snippets

def get_numbers(length):
    start, stop = 0, 0
    while stop < length:
        start, stop = stop, stop + random.randint(1, length)
        yield start, stop

def get_numbers(length):
    list_ = []
    start, stop = 0, 0
    while stop < length:
        start, stop = stop, stop + random_int(length)
        list_.append((start, stop))
    return list_

def random_multisplitter(word):
    for start, stop in get_numbers(len(word)):
        yield word[start:stop]

def random_multisplitter(word):
    return [word[start:stop] for start, stop in get_numbers(len(word))]

from random import randint

def random_multisplitter(word):
    length = len(word)
    start, stop = 0, 0
    while stop < length:
        start, stop = stop, stop + randint(1, length)
        yield word[start:stop]

# Generator
random_multisplitter('runtime')
# List
list(random_multisplitter('runtime'))

Context

StackExchange Code Review Q#105286, answer score: 7

Revisions (0)

No revisions yet.