patternMinor
Loops at low level
Viewed 0 times
levelloopslow
Problem
When programming with high level languages like Python or Matlab, it is always better to model the problem such as the solution is some kind of vector or matrix multiplication, to avoid loops. But this vector multiplication will itself take each elements from both arrays and multiply them so basically there must be some loops at some level. Is that the case or is there special methods at assembler level that make multiplications of memory blocks with no loops?
Solution
When you are using vectorized operations in a high-level language such as Matlab or python, you are not avoiding loops, but rather pushing them from the high-level language (Matlab or python) to a low-level language (C or fortran) which can execute these loops much faster. Loops in Matlab or python are slow for several reasons, the main ones being that these languages are interpreted, that typing is dynamic, and that even basic data types involve a lot of overhead.
Some old supercomputers had a vectorized architecture (for example Crays), but this is no longer the case. Nowadays there are small-scale vectorized instructions which operate on a small number of data points at once (as mentioned in EvilJS's answer), though their use could be rather specialized.
Loop unrolling, mentioned by EvilJS, can be used to cut down the number of conditional jumps, which slow down the execution of the code. The body of the loop is repeated a small number of times, thus reducing the frequency of conditional jumps at the expense of code size (which could be important due to the existence of code caches). Such tricks are better left to the implementors of standard libraries.
Some old supercomputers had a vectorized architecture (for example Crays), but this is no longer the case. Nowadays there are small-scale vectorized instructions which operate on a small number of data points at once (as mentioned in EvilJS's answer), though their use could be rather specialized.
Loop unrolling, mentioned by EvilJS, can be used to cut down the number of conditional jumps, which slow down the execution of the code. The body of the loop is repeated a small number of times, thus reducing the frequency of conditional jumps at the expense of code size (which could be important due to the existence of code caches). Such tricks are better left to the implementors of standard libraries.
Context
StackExchange Computer Science Q#45733, answer score: 6
Revisions (0)
No revisions yet.