HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

P2P streaming sim

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
streamingp2psim

Problem

I am working on a P2P streaming sim (which I think is correct), however I've discovered a huge bottleneck in my code. During the run of my sim, the function I've included below gets called about 25000 times and takes about 560 seconds from a total of about 580 seconds (I used pycallgraph).

Can anyone identify any way I could speed the execution up, by optimizing the code (I think I won't be able to reduce the times it gets executed)? The basics needed to understand what the function does, is that the incoming messages consist of [sender_sequence nr, sender_id, nr_of_known_nodes_to_sender (or -1 if the sender departs the network), time_to_live], while the mCache entries these messages update, consist of the same data (except for the third field, which cannot be -1, instead if such a message is received the whole entry will be deleted), with the addition of a fifth field containing the update time.

```
def msg_io(self): # Node messages (not datastream) control

# filter and sort by main key
valid_sorted = sorted((el for el in self.incoming if el[3] != 0), key=ig(1))
self.incoming = []
# ensure identical keys have highest first element first
valid_sorted.sort(key=ig(0), reverse=True)
# group by second element
grouped = groupby(valid_sorted, ig(1))
# take first element for each key
internal = [next(item) for group, item in grouped]
for j in internal:
found = False
if j[3] > 0 : # We are only interested in non-expired messages
for i in self.mCache[:]:
if j[1] == i[1]: # If the current mCache entry corresponds to the current incoming message
if j[2] == -1: # If the message refers to a node departure
self.mCache.remove(i) # We remove the node from the mCache
self.partnerCache = [partner for partner in self.partnerCache if no

Solution

valid_sorted is created using sorted function, and then it is sorted again. I don't see how the first sort is helping in any way. Maybe I'm wrong..

You create the iterator grouped with the function groupby, but then you turn it into list and then iterate over it again. instead - you can do this:

grouped = groupby(valid_sorted, ig(1))
for group, item in grouped:
    j = next(item)
    ...


That way - you only iterate over it once. :)

The inner loop iterating over self.mCache[:] - I don't see why you can't just iterate over self.mCache and that's it. If its members are lists - it will change weither you use [:] or not.

When you use [:] - you create a copy of the list, and that's why It can be bad for memory and speed.

The remove function can be quite slow, if you can find a better way to implement your algorithm without using this function - it can be faster.

Code Snippets

grouped = groupby(valid_sorted, ig(1))
for group, item in grouped:
    j = next(item)
    ...

Context

StackExchange Code Review Q#19922, answer score: 3

Revisions (0)

No revisions yet.