gotchapythonMinor
Why does parallelising slow down this simple problem against looping through all the data?
Viewed 0 times
thisproblemwhysimpleloopingtheallparallelisingslowagainst
Problem
I've been using multiprocessing and parallelisation for the first time this week on a very large data set using 32 CPUs. I decided to explore it for a smaller task just to see if I could learn anything, just on the 4 CPUs of my Mac.
I created a task to add 100 to every element in a 500,000 element list. To my surprise, I noticed that batching this data and using Python's parallelising tools to implement this actually slowed it down hugely, compared to just looping through the 500,000 elements and adding 1.
I'd like to understand why.
Consider the two methods for doing this task below:
```
import numpy as np
from sqlitedict import SqliteDict
from multiprocessing import Pool, cpu_count
from gensim.corpora.wikicorpus import init_to_ignore_interrupt
from itertools import zip_longest
import timeit as t
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
class Add100ToData():
def __init__(self):
self.data = [np.random.randint(0, 100) for _ in range(500000)]
def add100(self):
for i in range(len(self.data)):
self.data[i] = self.data[i] + 100
return self.data
class Add100ToDataMultiprocess():
def __init__(self):
self.data = [np.random.randint(0, 100) for _ in range(500000)]
def process_batch(self, batch):
new_data = []
for i in batch:
new_data.append(i + 100)
return new_data
def add100(self, batch_size):
processes = cpu_count()
pool = Pool(processes, init_to_ignore_interrupt)
gr = grouper(self.data, batch_size)
for batch_result in pool.imap(self.process_batch, gr):
count = 0
for i in batch_result:
count += 1
self.data[count] = i
return self.data
if __name__ == "__main__":
add1 = Add100ToData()
start = t.default_timer()
final1 = add1.add100()
end = t.default_timer()
print("Loop
I created a task to add 100 to every element in a 500,000 element list. To my surprise, I noticed that batching this data and using Python's parallelising tools to implement this actually slowed it down hugely, compared to just looping through the 500,000 elements and adding 1.
I'd like to understand why.
Consider the two methods for doing this task below:
```
import numpy as np
from sqlitedict import SqliteDict
from multiprocessing import Pool, cpu_count
from gensim.corpora.wikicorpus import init_to_ignore_interrupt
from itertools import zip_longest
import timeit as t
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
class Add100ToData():
def __init__(self):
self.data = [np.random.randint(0, 100) for _ in range(500000)]
def add100(self):
for i in range(len(self.data)):
self.data[i] = self.data[i] + 100
return self.data
class Add100ToDataMultiprocess():
def __init__(self):
self.data = [np.random.randint(0, 100) for _ in range(500000)]
def process_batch(self, batch):
new_data = []
for i in batch:
new_data.append(i + 100)
return new_data
def add100(self, batch_size):
processes = cpu_count()
pool = Pool(processes, init_to_ignore_interrupt)
gr = grouper(self.data, batch_size)
for batch_result in pool.imap(self.process_batch, gr):
count = 0
for i in batch_result:
count += 1
self.data[count] = i
return self.data
if __name__ == "__main__":
add1 = Add100ToData()
start = t.default_timer()
final1 = add1.add100()
end = t.default_timer()
print("Loop
Solution
Parallelism has costs. The processes have to be scheduled, communicate with each other, manage resources, etc. In return you can do multiple things at the same time.
When you have a lot of slow tasks that can be done independently, parallel processing will speed things up a lot.
But when you try to parallelize an easy task it might take longer to handle the overhead than to actually do the work. That seems to be the case here.
When you have a lot of slow tasks that can be done independently, parallel processing will speed things up a lot.
But when you try to parallelize an easy task it might take longer to handle the overhead than to actually do the work. That seems to be the case here.
Context
StackExchange Computer Science Q#95896, answer score: 3
Revisions (0)
No revisions yet.