HiveBrain v1.2.0
Get Started
← Back to all entries
principleMinor

Theoretical speed gain of quad core vs. single core

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
coregaintheoreticalsinglequadspeed

Problem

I first asked this question at cstheory, but they suggested to ask my question here, so here it goes ...

I'm working on my masters thesis and I need to have theoretical value of the (average) speed gain that a quadcore processor brings compared to a singlecore processor, when they both use the same frequency. So for example the speed gain of a 2 GHz singlecore vs. 2 Ghz quadcore.

Somewhere on the internet I've read that a quadcore is 2.6 times faster than singlecore, but the author didn't mention any source so I cannot use that in my thesis.

I've been trying to calculate some things myself, but didn't come to a conclusion. I tried like this:

threads | quad core | single core | ratio 
--------|-----------|-------------|-------
1       | 1         | 1           | 1
2       | 2         | 1/2         | 4
3       | 3         | 1/3         | 9
4       | 4         | 1/4         | 16
5       | 3+1/2     | 1/5         | 17.5
6       | 2+2(1/2)  | 1/5         | 15
7       | 1+3(1/2)  | 1/5         | 12.5
...


This table represents the timeslices available to execute a task (I've taken a fair 50/50 usage for each thread). For example when using a singlecore and a application uses 3 threads, each thread can work 1/3 of the time, while with a quadcore each thread can work 100% of the time, because 3 threads can be spread accross a separate core. After playing with some calculations in Excel, I could not come to any conclusion.

I'm a bit stuck and need some fresh ideas on how to get a theoretical number that represents how much faster a quad core is (on average) compared to a single core. Maybe some of you know some empirical numbers with a good reference to the source? Anyway, all the help is welcome, because I'm a bit stuck and this is the last part I need to cover in my thesis.

Thanks!

Solution

There is no absolute value of how much faster a quad-core would be compared to a single-core. As the comments have said, it very much depends on exactly what you are doing whether it will be faster or slower. Even live performance testing may not give a fair answer since it won't model exact runtime behaviour.

In any case, if you are looking for a theoretical model it certainly is possible, though, as you can quickly see, not practically possible.

First you need to understand the architecure of the chips of the involved. When you move from single-core to quad-core not all resources are actually quadrupled. In particular the memory is still a shared resource. This memory is then split between a series of caches, some of which are shared, and some of which are not shared. Each cache is connected to the other caches either by a dedicated bus, or a shared data bus. Each of these caches, and buses, has a real bandwidth and latency number. This is not likely published, but with some very focused tests can be obtained.

Second you need to understand how the application will be using memory. How this memory is shared and the order in which it is accessed. For a single thread you could, in theory, calculate the amount of time it spends loading data from the various cache levels. For a multi-core case a key value is how often you attempt to write to the same memory. In this case you will invoke the chips cache-coherency algorithm. This algorithm is usually documented (to some degree) and the time to synchronize can be reasonably well measured.

Combine these together and you'll see that with perfect knowledge of how the program behaves and how the processor behaves you would be able to calculate the theoretical gains of increasing the number of cores. In pratice this simply won't be possible for all but the simplest of programs (but even then the way the chips manage memory is complex enough to make the problem too difficult).

What you should get from this description is primarily that processor speed alone is not the only problem. Memory is often the key bottleneck. When svick says perfectly parallelized he means they don't share memory. In this case your threads will approach the maximum 4x speed gain as the memory contention is low. However, they still share some memory cache for reading (L3 and main memory on Intel) and thus could still be limited by the total memory bandwidth.

If you wish to look at a degenerate example lookup "false sharing". In such cases the total speed goes down as more processors are added.

Context

StackExchange Computer Science Q#2300, answer score: 6

Revisions (0)

No revisions yet.