HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Is there any study or theory behind combining binary search and interpolation search?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
combiningstudysearchtheoryanybinarybehindandthereinterpolation

Problem

I just read Can this algorithm still be considered a Binary Search algorithm? and recalled that a few years back I wrote an indexer/search for log files to find log entries in large plain text files by date/time window.

Whilst doing this, I decided to try interpolation search (I didn't know that was what it was called, I kind of stumbled across the idea by myself). Then for some reason I continued to the idea of alternating interpolation steps with binary split steps: On step 0 I would interpolate to decide test point, then step 1 I would take the exact midpoint etc.

I then benchmarked the system using pure interpolation search, pure binary search and my combination attempt. The alternating approach was a clear winner, both in time and number of tests required before finding a set of randomly chosen times.

Inspired by the linked question, I just made a quick search for "alternating interpolation search and binary search" and found nothing. I also tried "hedged interpolation search" as suggested on my comment on one of the answers.

Have I stumbled across a known thing? Is there any theoretical justification for it being faster for certain types of data? The log files were typically large for the time (e.g. 1-2 GB of text with maybe 10 million rows to search), and the spread of dates/times in them was complex with heavy bursts of activity, general peak times and quiet times. My benchmark tests sampled from an even distribution of target times to find.

Solution

Interleaving two algorithms to get the best of both worlds is a known technique, though it is usually stated as running them in "parallel" and returning an answer as soon as either terminates.

Though theoretically faster, interpolation search has two disadvantages compared to binary search:

-
It has terrible (linear) worst case performance

-
The overhead of computing the midpoint is rather large; a binary search iteration is hundreds of times faster than an interpolation search one

I would expect that an approach where you do interpolation search while the range is large and switch to binary search when the range becomes small is the most efficient. It would be nice if you could try this experiment.

As your dataset becomes small, the difference between $\log n$ and $\log \log n$ becomes insignificant; $\log n$ is already really small, and $\log \log n$ couldn't possibly be much smaller. At this point, the overhead of doing interpolation search is not worth it compared to the iterations you might save.

I think that your results can be explained by two phenomena:

-
Combining with binary search allows you to avoid the worst-case behavior

-
The positive effect of switching to binary search on a small dataset

Context

StackExchange Computer Science Q#59750, answer score: 6

Revisions (0)

No revisions yet.