gotchapythonMinor

Why does parallelising slow down this simple problem against looping through all the data?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

cs multi-tasking stackoverflow python cpu efficiency runtime-analysis

thisproblemwhysimpleloopingtheallparallelisingslowagainst

Problem

I've been using multiprocessing and parallelisation for the first time this week on a very large data set using 32 CPUs. I decided to explore it for a smaller task just to see if I could learn anything, just on the 4 CPUs of my Mac.

I created a task to add 100 to every element in a 500,000 element list. To my surprise, I noticed that batching this data and using Python's parallelising tools to implement this actually slowed it down hugely, compared to just looping through the 500,000 elements and adding 1.

I'd like to understand why.

Consider the two methods for doing this task below:

```
import numpy as np
from sqlitedict import SqliteDict
from multiprocessing import Pool, cpu_count
from gensim.corpora.wikicorpus import init_to_ignore_interrupt
from itertools import zip_longest
import timeit as t

def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)

class Add100ToData():
def __init__(self):
self.data = [np.random.randint(0, 100) for _ in range(500000)]

def add100(self):
for i in range(len(self.data)):
self.data[i] = self.data[i] + 100
return self.data

class Add100ToDataMultiprocess():
def __init__(self):
self.data = [np.random.randint(0, 100) for _ in range(500000)]

def process_batch(self, batch):
new_data = []
for i in batch:
new_data.append(i + 100)
return new_data

def add100(self, batch_size):
processes = cpu_count()
pool = Pool(processes, init_to_ignore_interrupt)
gr = grouper(self.data, batch_size)

for batch_result in pool.imap(self.process_batch, gr):
count = 0
for i in batch_result:
count += 1
self.data[count] = i
return self.data

if __name__ == "__main__":
add1 = Add100ToData()
start = t.default_timer()
final1 = add1.add100()
end = t.default_timer()
print("Loop

Solution

Parallelism has costs. The processes have to be scheduled, communicate with each other, manage resources, etc. In return you can do multiple things at the same time.

When you have a lot of slow tasks that can be done independently, parallel processing will speed things up a lot.

But when you try to parallelize an easy task it might take longer to handle the overhead than to actually do the work. That seems to be the case here.

Context

StackExchange Computer Science Q#95896, answer score: 3

Revisions (0)

No revisions yet.