patterncppModerate
Enigma simulator performance
Viewed 0 times
simulatorperformanceenigma
Problem
Here is my implementation of a simple 3 rotor Enigma machine in C++:
Benchmark ran on a 2.66ghz Core 2 Duo for 1 million decrypts:
C++ Version:
I'm not a C++ programmer, I first wrote this in Javascript and then translated it into C++.
Whilst 1 million in ~10 seconds is OK, I feel that my implementation isn't optimal. Are there any optimisations, or micro optimisations that you could recommend to my C++ code to make it faster? Maybe I should try a different language? Any suggestions are welcome.
#include
#include
using namespace std;
char alpha[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char rotors[3][27] =
{
"EKMFLGDQVZNTOWYHXUSPAIBRCJ",
"AJDKSIRUXBLHWTMCQGZNPYFVOE",
"BDFHJLCPRTXVZNYEIWGAKMUSQO"
};
char reflector[] = "YRUHQSLDPXNGOKMIEBFZCWVJAT";
char key[] = "ABC";
long mod26(long a)
{
return (a%26+26)%26;
}
int li (char l)
{
// Letter index
return l - 'A';
}
int indexof (char* array, int find)
{
return strchr(array, find) - array;
}
string crypt (const char *ct)
{
// Sets initial permutation
int L = li(key[0]);
int M = li(key[1]);
int R = li(key[2]);
string output;
for ( int x = 0; x < strlen(ct) ; x++ ) {
int ct_letter = li(ct[x]);
// Step right rotor on every iteration
R = mod26(R + 1);
// Pass through rotors
char a = rotors[2][mod26(R + ct_letter)];
char b = rotors[1][mod26(M + li(a) - R)];
char c = rotors[0][mod26(L + li(b) - M)];
// Pass through reflector
char ref = reflector[mod26(li(c) - L)];
// Inverse rotor pass
int d = mod26(indexof(rotors[0], alpha[mod26(li(ref) + L)]) - L);
int e = mod26(indexof(rotors[1], alpha[mod26(d + M)]) - M);
char f = alpha[mod26(indexof(rotors[2], alpha[mod26(e + R)]) - R)];
output += f;
}
return output;
}
int main ()
{
for ( int i = 0; i < 1000000; i++) {
crypt ("PZUFWDSASJGQGNRMAEODZJXQQKHSYGVUSGSU");
}
return 0;
}Benchmark ran on a 2.66ghz Core 2 Duo for 1 million decrypts:
C++ Version:
11.64s user 0.02s system 97% cpu 11.963 totalI'm not a C++ programmer, I first wrote this in Javascript and then translated it into C++.
Whilst 1 million in ~10 seconds is OK, I feel that my implementation isn't optimal. Are there any optimisations, or micro optimisations that you could recommend to my C++ code to make it faster? Maybe I should try a different language? Any suggestions are welcome.
Solution
Let's profile this a little bit (this is on a dual-core i7, so a bit quicker):
Running Time Self Symbol Name
3984.0ms 100.0% 0.0 Main Thread 0x2856c
3982.0ms 99.9% 0.0 start
3982.0ms 99.9% 1.0 main
3936.0ms 98.7% 3081.0 crypt(char const*)
640.0ms 16.0% 640.0 _platform_strchr
126.0ms 3.1% 102.0 std::__1::basic_string, std::__1::allocator >::push_back(char)
89.0ms 2.2% 89.0 strlen
14.0ms 0.3% 14.0 DYLD-STUB$$strchr
14.0ms 0.3% 4.0 free
8.0ms 0.2% 1.0 szone_free_definite_size
7.0ms 0.1% 7.0 DYLD-STUB$$strlen
2.0ms 0.0% 2.0 operator delete(void*)
2.0ms 0.0% 0.0 _dyld_start
Ok, so more than 98% of the time is spent in the
Ok, so how do we cut down on these:
Firstly, we'll switch over to using
The
which is close to what it was, but with a few minor changes (no
And we now call it like so:
Let's profile this again:
Running Time Self Symbol Name
2873.0ms 100.0% 0.0 Main Thread 0x2c7ff
2872.0ms 99.9% 0.0 start
2872.0ms 99.9% 2.0 main
2870.0ms 99.8% 2870.0 std::__1::array crypt
(std::__1::array const&)
1.0ms 0.0% 0.0 _dyld_start
Ok, so everything except
Running Time Self Symbol Name
1855.0ms 100.0% 0.0 Main Thread 0x2ca5c
1854.0ms 99.9% 0.0 start
1854.0ms 99.9% 1.0 main
1853.0ms 99.8% 1853.0 std::__1::array crypt
(std::__1::array const&)
1.0ms 0.0% 0.0 _dyld_start
We have imposed a fairly large constraint however: we need to know the size of the string to be encrypted at compile-time (so no reading it from a file or
Running Time Self Symbol Name
3984.0ms 100.0% 0.0 Main Thread 0x2856c
3982.0ms 99.9% 0.0 start
3982.0ms 99.9% 1.0 main
3936.0ms 98.7% 3081.0 crypt(char const*)
640.0ms 16.0% 640.0 _platform_strchr
126.0ms 3.1% 102.0 std::__1::basic_string, std::__1::allocator >::push_back(char)
89.0ms 2.2% 89.0 strlen
14.0ms 0.3% 14.0 DYLD-STUB$$strchr
14.0ms 0.3% 4.0 free
8.0ms 0.2% 1.0 szone_free_definite_size
7.0ms 0.1% 7.0 DYLD-STUB$$strlen
2.0ms 0.0% 2.0 operator delete(void*)
2.0ms 0.0% 0.0 _dyld_start
Ok, so more than 98% of the time is spent in the
crypt function (which isn't surprising really). Almost 16% of the time is spent in strchr, with another 3.1% spent in push_back. There's also a few percent (2.2) spent doing strlen. Ok, so how do we cut down on these:
Firstly, we'll switch over to using
std::array instead of char *. These are both stack-allocated (and so we don't incur the dynamic allocation overhead of std::string or std::vector), but std::array knows its size, and will allow us to remove the strlen in the crypt function:#include
const std::array alpha = {"ABCDEFGHIJKLMNOPQRSTUVWXYZ"};
const std::array, 3> rotors
{
{{"EKMFLGDQVZNTOWYHXUSPAIBRCJ"},
{"AJDKSIRUXBLHWTMCQGZNPYFVOE"},
{"BDFHJLCPRTXVZNYEIWGAKMUSQO"}}
};
const std::array reflectors = {"YRUHQSLDPXNGOKMIEBFZCWVJAT"};
const std::array key = {"ABC"};The
indexof function then needs to change as well (and I'm going to rename it to index_of, which is easier to read):template
std::size_t index_of (const std::array& str, int find)
{
for(std::size_t i = 0; i < N; ++i) {
if(str[i] == find) return i;
}
return -1;
}crypt now looks like:template
std::array crypt (const std::array& ct)
{
// Sets initial permutation
int L = li(key[0]);
int M = li(key[1]);
int R = li(key[2]);
std::array output;
for ( unsigned x = 0; x (mod26(R + 1));
// Pass through rotors
char a = rotors[2][mod26(R + ct_letter)];
char b = rotors[1][mod26(M + li(a) - R)];
char c = rotors[0][mod26(L + li(b) - M)];
// Pass through reflector
char ref = reflectors[mod26(li(c) - L)];
// Inverse rotor pass
long d = mod26(index_of(rotors[0], alpha[mod26(li(ref) + L)]) - L);
long e = mod26(index_of(rotors[1], alpha[mod26(d + M)]) - M);
char f = mod26(index_of(rotors[2], alpha[mod26(e + R)]) - R);
output[x] = alpha[f];
}
return output;
}which is close to what it was, but with a few minor changes (no
+= on a string, as we know already how large it will be and simple assign directly into that index).And we now call it like so:
int main ()
{
for ( int i = 0; i {"PZUFWDSASJGQGNRMAEODZJXQQKHSYGVUSGSU"});
}
return 0;
}Let's profile this again:
Running Time Self Symbol Name
2873.0ms 100.0% 0.0 Main Thread 0x2c7ff
2872.0ms 99.9% 0.0 start
2872.0ms 99.9% 2.0 main
2870.0ms 99.8% 2870.0 std::__1::array crypt
(std::__1::array const&)
1.0ms 0.0% 0.0 _dyld_start
Ok, so everything except
crypt has now been inlined. If we apply the modification from @MrSmith42, we shave even more time off:Running Time Self Symbol Name
1855.0ms 100.0% 0.0 Main Thread 0x2ca5c
1854.0ms 99.9% 0.0 start
1854.0ms 99.9% 1.0 main
1853.0ms 99.8% 1853.0 std::__1::array crypt
(std::__1::array const&)
1.0ms 0.0% 0.0 _dyld_start
We have imposed a fairly large constraint however: we need to know the size of the string to be encrypted at compile-time (so no reading it from a file or
stdin or the like). If this will always be the case, then this provides a fairly significant speed-up on my hardware.Code Snippets
#include <array>
const std::array<char, 27> alpha = {"ABCDEFGHIJKLMNOPQRSTUVWXYZ"};
const std::array<std::array<char, 27>, 3> rotors
{
{{"EKMFLGDQVZNTOWYHXUSPAIBRCJ"},
{"AJDKSIRUXBLHWTMCQGZNPYFVOE"},
{"BDFHJLCPRTXVZNYEIWGAKMUSQO"}}
};
const std::array<char, 27> reflectors = {"YRUHQSLDPXNGOKMIEBFZCWVJAT"};
const std::array<char, 4> key = {"ABC"};template <size_t N>
std::size_t index_of (const std::array<char, N>& str, int find)
{
for(std::size_t i = 0; i < N; ++i) {
if(str[i] == find) return i;
}
return -1;
}template <size_t N>
std::array<char, N> crypt (const std::array<char, N>& ct)
{
// Sets initial permutation
int L = li(key[0]);
int M = li(key[1]);
int R = li(key[2]);
std::array<char, N> output;
for ( unsigned x = 0; x < N ; x++ ) {
int ct_letter = li(ct[x]);
// Step right rotor on every iteration
R = static_cast<int>(mod26(R + 1));
// Pass through rotors
char a = rotors[2][mod26(R + ct_letter)];
char b = rotors[1][mod26(M + li(a) - R)];
char c = rotors[0][mod26(L + li(b) - M)];
// Pass through reflector
char ref = reflectors[mod26(li(c) - L)];
// Inverse rotor pass
long d = mod26(index_of(rotors[0], alpha[mod26(li(ref) + L)]) - L);
long e = mod26(index_of(rotors[1], alpha[mod26(d + M)]) - M);
char f = mod26(index_of(rotors[2], alpha[mod26(e + R)]) - R);
output[x] = alpha[f];
}
return output;
}int main ()
{
for ( int i = 0; i < 1000000; i++) {
crypt(std::array<char, 37>{"PZUFWDSASJGQGNRMAEODZJXQQKHSYGVUSGSU"});
}
return 0;
}Context
StackExchange Code Review Q#44196, answer score: 11
Revisions (0)
No revisions yet.