HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

Is Timsort more efficient than merge sort and why?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
whymergethanmoretimsortefficientandsort

Problem

I was just wondering, I think merge sort is more efficient but not sure if that is true. I know it's to do with the complexities but am still struggling to understand.

Solution

In terms of asymptotic complexity, timsort and merge sort have the same worst-case complexity: they both make $O(n \log n)$ comparisons to sort a list of $n$ elements.

Given a particular input, a particular implementation of timsort may or may not be faster than a particular implementation of merge sort.

Timsort is designed to have a maximum overhead over merge sort, i.e. it won't be more than a certain constant factor slower than merge sort, because it's based on merge sort, and only uses other techniques for small parts of the data.

Timsort uses insertion sort for very small amounts of data; this is typically more efficient than a pure merge sort because the benefits of merge sort over insertion sort are asymptotic. Merge sort is asymptotically faster than insertion sorts, which means that there is a threshold $N$ such that if $n \ge N$ then sorting $n$ elements with merge sort is faster than with insertion sort. The numerical value of threshold depends on the specific implementations though. With typical optimized implementations, insertion sort beats merge sort for a small amount of data. Most sort routines in the real world are hybrid, using an $O(n \log n)$, divide-and-conquer technique for large amounts of data and using a different technique (usually insertion sort) when they've broken down the data into small enough pieces. Thus a properly implemented timsort is faster on average than a pure merge sort.

Timsort is furthermore optimized to deal well with real-world data. Real-world data is not randomly distributed: it's common to have sorted runs in the data to be sorted. Compared with a basic merge+insertion sort, timsort attempts to detect and make use of such runs. This adds a small overhead of checking whether parts of the data are already sorted, but with typical real-world data this saves some actual sorting work which more than compensates.

Context

StackExchange Computer Science Q#84168, answer score: 8

Revisions (0)

No revisions yet.