HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Custom sorting a list of objects efficiently

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
sortingefficientlyobjectscustomlist

Problem

I'm trying to make sorting of objects in my program as fast as possible. I know that sorting with a cmp function is deprecated in python 3.x and that using keys is faster, but I'm not sure how to get the same functionality without using a cmp function.

Here's the definition of the class:

class Topic:
    def __init__(self, x, y, val):
        self.id = val
        self.x = x
        self.y = y


I have a dictionary full of Topic to float key, value pairs and a list of Topics to be sorted. Each topic in the list of Topics to be sorted has an entry in this dictionary. I need to sort the list of topics by the value in the dictionary. If two topics have a difference in value <= .001, the topic with higher ID should come first.

Here's the current code I'm using:

topicToDistance = {}
# ...
# topicToDistance populated
# ...
topics = topicToDistance.keys()
def firstGreaterCmp(a, b):
  if abs(topicToDistance[a]-topicToDistance[b])  topicToDistance[b]:
      return 1
  return -1
# sorting by key may be faster than using a cmp function
topics.sort(cmp = firstGreaterCmp)


Any help making this a bit more efficient would be greatly appreciated.

Solution

Note that comparison doesn't have to return 1 or -1. For example returning 123 would have the same effect as returning 1. With this in mind, the firstGreaterCmp function could be somewhat simpler:

def firstGreaterCmp(a, b):
    difference = topicToDistance[a] - topicToDistance[b]
    if abs(difference) <= .001:
        if a.id < b.id:
            return 1
    return int(difference)


On the other hand, the comparison function must return an int, and since your value is a float, I had to cast it. I don't really know if that has a non-negligible cost or not.

UPDATE

Also, as @janne-karila pointed out in a comment,
int(difference) is 0 for a range of values (due to integer truncation),
while your original code never returns 0.
It depends on your use case whether this is acceptable or not.

In the Topic class, I find it somewhat strange that the id parameter comes last in the parameter list.
Normally I would expect an id field to be the first.

Your code doesn't follow PEP8. It would be better this way:

def firstGreaterCmp(a, b):
    if abs(topicToDistance[a] - topicToDistance[b])  topicToDistance[b]:
        return 1
    return -1

# sorting by key may be faster than using a cmp function
topics.sort(cmp=firstGreaterCmp)

Code Snippets

def firstGreaterCmp(a, b):
    difference = topicToDistance[a] - topicToDistance[b]
    if abs(difference) <= .001:
        if a.id < b.id:
            return 1
    return int(difference)
def firstGreaterCmp(a, b):
    if abs(topicToDistance[a] - topicToDistance[b]) <= .001:
        if a.id < b.id:
            return 1
    if topicToDistance[a] > topicToDistance[b]:
        return 1
    return -1

# sorting by key may be faster than using a cmp function
topics.sort(cmp=firstGreaterCmp)

Context

StackExchange Code Review Q#68617, answer score: 2

Revisions (0)

No revisions yet.