patternModerate

What is meant by superlinear speedup? Is it possible to have superlinear speedup in practice?

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

multi-tasking cs parallel-computing stackoverflow

meantwhatpracticespeeduppossiblesuperlinearhave

Problem

In parallel computing, I know the speed up equation is

$$ \frac{1}{ s + \frac{1-s}{p} } $$

But what is meant by superlinear speed up? Is it something theoritical? Could you explain it with equations?

Solution

With equation: not really.

Superlinear speedup comes from exceeding naively calculated speedup even after taking into account the communication process (which is fading, but still this is the bottleneck).

For example, you have serial algorithm that takes $1t$ to execute. You have $1024$ cores, so naive speedup is $1024x$, or it takes $t/1024$, but it should be calculated like in your equation taking into account memory transfer, slight modifications to the algorithm, parallelisation time.

So speedup should be lower than 1024x, but sometimes it happens that speedup is bigger, then we call it $superlinear$.

Where it comes from?

From several places: cache usage (what fits into registers, main memory or mass storage, where very often more processing units gives overall more registers per subtask), memory hit patterns, simply better (or a slightly different) algorithm, flaws in the serial code.

For example, a random process that searches space for a result is now divided into $1024$ searchers covering more space at once so finding the solution faster is more probable.
There are byproducts (if you swap elements like in bubble sort and switch into GPU it swaps all pairs at once, while serial only up to point).

On the distributed system communication is even more costly, so programs are changed to make memory usage local (which also changes memory access, divides problem differently than in sequential application).

And the most important, the sequential program is not ideally the same as the parallel version - different technology, environment, algorithm, etc. so it is hard to compare them.

Excerpt from "Introduction to Parallel Computing" Second edition by Ananth Grama, 2003

Theoretically speedup can never exceed the number of processing elements $p$.

If the best sequential algorithm takes $T_s$ units of time to solve a given problem on a single processing element, then a speedup of $p$ can be obtained on $p$ processing elements if none of them spends more than time $T_s/p$.

A speedup greater than $p$ is possible only if each processing element spends less than time $T_s/p$ solving the problem.

In this case, a single processing element could emulate the $p$ processing elements and solve the problem in fewer than $T_s$ units of time.

This is a contradiction because speedup, by definition is computed with respect to the best sequential algorithm.

So the name "superlinear" in this context comes from the definition of speedup.

Context

StackExchange Computer Science Q#55433, answer score: 13

Revisions (0)

No revisions yet.