patternMinor
Variable size and CPU performance
Viewed 0 times
sizeperformanceandcpuvariable
Problem
When you learn programming they tell you choose data types that suffice for the concepts you're expressing, i.e. not too small because then your data won't fit and not too big because you don't want nonsensical values stored.
Table of unsigned integers
According to the table above, an 8-bit unsigned integers can store e.g. the number of weeks in a year (approximately 52) and a 64-bit unsigned integer can store e.g. astronomical values.
To get to the point of my question; how do the different variable sizes affect the performance of a program? I imagine that a 64-bit CPU internally handles all variables as 64-bit and simply ignores the highest bits, so an 8-bit variable, travelling across the system, would be handled as:
Where y is either 0/1 and x is ignored.
Would it make any difference performance-wise to only use the largest variable size on the platform and use business logic to enforce reasonableness in the values being processed and stored? Is there a performance hit when using variable sizes lower than the native variable size of the architecture?
Table of unsigned integers
- 8-bit: 0 to 255
- 16-bit: 0 to 65,535
- 32-bit: 0 to 4,294,967,295
- 64-bit: 0 to 18,446,744,073,709,551,615
According to the table above, an 8-bit unsigned integers can store e.g. the number of weeks in a year (approximately 52) and a 64-bit unsigned integer can store e.g. astronomical values.
To get to the point of my question; how do the different variable sizes affect the performance of a program? I imagine that a 64-bit CPU internally handles all variables as 64-bit and simply ignores the highest bits, so an 8-bit variable, travelling across the system, would be handled as:
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx yyyyyyyyWhere y is either 0/1 and x is ignored.
Would it make any difference performance-wise to only use the largest variable size on the platform and use business logic to enforce reasonableness in the values being processed and stored? Is there a performance hit when using variable sizes lower than the native variable size of the architecture?
Solution
Would it make any difference performance-wise to only use the largest variable size on the platform and use business logic to enforce reasonableness in the values being processed and stored?
Yes, it could make your performance worse.
Memory, memory bandwidth, cache density are primary concerns. Larger data size can:
E.g. x86 CPUs always load (if not already in cache) a whole cache line from memory.
So, assuming a cache line of 64 bytes, there are 64 8-bit-integers per cache line, yet only eight 64-bit-integers. Thus resulting in eight times as many memory transfers for processing a large array.
Also consider that many compiler are smart enough to figure out how to perform automatic vectorization1.
Taking advantage of the processor's SIMD capabilities you can pack eight 16-bit integers into the same space as two 64-bit integers and do four times as many operations at once.
Is there a performance hit when using variable sizes lower than the native variable size of the architecture?
There isn't a general rule.
x86 CPUs can operate on fractions of a register and there will be no slow down at all2.
Other CPUs (e.g. PowerPC) cannot process a fraction of a register. So performing multiple operations on small integers may require masking / cutting back partial results. Anyway it's just a simple
As a corner case some instructions can be slower when performed with the CPU's "native size" (e.g. multiplication / division of 64-bit quantities vs 32-bit quantities on Intel Pentium IV).
Interesting readings are:
1) automatic vectorization is a major research topic in computer science
2) x86 processors can suffer a partial register stall if you, say, write to
Yes, it could make your performance worse.
Memory, memory bandwidth, cache density are primary concerns. Larger data size can:
- result in increased cache and translation lookaside buffer (TLB) misses;
- require more RAM to avoid paging.
E.g. x86 CPUs always load (if not already in cache) a whole cache line from memory.
So, assuming a cache line of 64 bytes, there are 64 8-bit-integers per cache line, yet only eight 64-bit-integers. Thus resulting in eight times as many memory transfers for processing a large array.
Also consider that many compiler are smart enough to figure out how to perform automatic vectorization1.
Taking advantage of the processor's SIMD capabilities you can pack eight 16-bit integers into the same space as two 64-bit integers and do four times as many operations at once.
Is there a performance hit when using variable sizes lower than the native variable size of the architecture?
There isn't a general rule.
x86 CPUs can operate on fractions of a register and there will be no slow down at all2.
Other CPUs (e.g. PowerPC) cannot process a fraction of a register. So performing multiple operations on small integers may require masking / cutting back partial results. Anyway it's just a simple
AND operation and often the compiler can find a way to avoid it entirely.As a corner case some instructions can be slower when performed with the CPU's "native size" (e.g. multiplication / division of 64-bit quantities vs 32-bit quantities on Intel Pentium IV).
Interesting readings are:
- What Every Programmer Should Know About Memory by Ulrich Drepper (chapter 6, about maximizing the efficiency of the data cache)
- int vs long long in 64 bit machine
- Performance 32 bit vs. 64 bit arithmetic
- The microarchitecture of Intel, AMD and VIA CPUs by Agner Fog
1) automatic vectorization is a major research topic in computer science
2) x86 processors can suffer a partial register stall if you, say, write to
BX and then try to read from EBX (something to consider mixing integer of different sizes).Context
StackExchange Computer Science Q#81784, answer score: 5
Revisions (0)
No revisions yet.