patterncppMinor
Custom Non-Crypto Hashing Algorithm
Viewed 0 times
noncustomcryptoalgorithmhashing
Problem
Rule of Thumb: Don't roll your own encryption--I'm no cryptographer obviously. Also note my apologies for my awful C-style C++.
For a fun little project I've written my own hash algorithm as a practice project. The goal was to write an algorithm that took an input
As far as I can tell I've accomplished two things:
The idea behind the algorithm is pretty simple. I stole some things from SHA-512 (SIGMA0/1, ROT macros and prime cuberoots). From there I setup the algorithm in two steps where I largely based the the hash on the data in the actual digest. All rotations and iterations in the hash are based on the data in the digest. The idea was that it'd be harder to find a collision if more of the hash is based on the actual digest.
```
/*
XOR: exclusive-or.
SUB: subtract
ADD: add
ROT: rotate
N/NOT: bit-wise NOT.
R/L: right/left.
FBA: Fast byte/bit alignment of powers of 2.
*/
#ifndef CUBE_HASH
#define CUBE_HASH
// Align some data by n (where n is some power of 2).
#define FBA(i,n) ((i + (n-1)) & ~(n-1))
// Rotate bits and also bitwise NOT rotated bits by a rotation of 0 > n) | (i > (sizeof(__uint64) - n)))
#define ROTR(i,n) ((i >> n) | (i > (sizeof(__uint64) - n)))
// MOD(i,p) returns i MOD 2^p and does not work on negative numbers.
#define MOD(i,p) (i & ((1 digest_length, Z wraps back to 0 due to modulo digest_length.
Two Steps:
XOR the current digest block with the current digets block NOTR of by N-bits which is
then SUB by the the SIGMA
For a fun little project I've written my own hash algorithm as a practice project. The goal was to write an algorithm that took an input
digest and gave me a generally unique output hash in the scope of 1024 bits per hash. I believe I've achieved something along those lines. On top of that the idea was to base the hash on as much of the digest as possible--theoretically making the hash more unique.As far as I can tell I've accomplished two things:
- The avalanche affect.
- Basic level: general uniqueness (to some degree however much or little that may be actually turn out to be...).
The idea behind the algorithm is pretty simple. I stole some things from SHA-512 (SIGMA0/1, ROT macros and prime cuberoots). From there I setup the algorithm in two steps where I largely based the the hash on the data in the actual digest. All rotations and iterations in the hash are based on the data in the digest. The idea was that it'd be harder to find a collision if more of the hash is based on the actual digest.
```
/*
XOR: exclusive-or.
SUB: subtract
ADD: add
ROT: rotate
N/NOT: bit-wise NOT.
R/L: right/left.
FBA: Fast byte/bit alignment of powers of 2.
*/
#ifndef CUBE_HASH
#define CUBE_HASH
// Align some data by n (where n is some power of 2).
#define FBA(i,n) ((i + (n-1)) & ~(n-1))
// Rotate bits and also bitwise NOT rotated bits by a rotation of 0 > n) | (i > (sizeof(__uint64) - n)))
#define ROTR(i,n) ((i >> n) | (i > (sizeof(__uint64) - n)))
// MOD(i,p) returns i MOD 2^p and does not work on negative numbers.
#define MOD(i,p) (i & ((1 digest_length, Z wraps back to 0 due to modulo digest_length.
Two Steps:
XOR the current digest block with the current digets block NOTR of by N-bits which is
then SUB by the the SIGMA
Solution
Some general comments about the code:
Is perhaps the comment / array name misleading? The fractional_cuberoots[64] array supposedly contains the fractional parts of the cuberoots of the first 64 prime numbers, and fractional_cubehash[16] contains the 16 first. But the values look totally different from those in fractional_cuberoots[64]. So how is the second array generated?
Also, in this line:
You are only copying 16 bytes, instead of 16 entries of __uint64. If your intent was to copy the entire array, you would need to use
I don't have time to delve into the actual hashing algorithm right now, but I might edit my answer to add more details when I have a look.
EDIT
Ok, there are actually a lot more problems that don't involve specifically hashing.
Your usage of the MOD function is incorrect given its definition. You often calculate things like
There might also be similar issues with your use of the other ROT macros since the shifts could well be beyond the bit sizes used. And also, for example this line:
You use sizeof(__uint64) as a bit size while this returns bytes...
Also you are allocating the digest as
This is also inconsistent with your usage of
There are lots of these types of confusions in the code, so I suggest resolving these inconsistencies before anything else since it is hard to understand what the hashing does if such details are unclear.
/*
The 16 fractional parts of the cuberoots of the first 64 prime numbers.
*/
const __uint64 fractional_cubehash[16]{Is perhaps the comment / array name misleading? The fractional_cuberoots[64] array supposedly contains the fractional parts of the cuberoots of the first 64 prime numbers, and fractional_cubehash[16] contains the 16 first. But the values look totally different from those in fractional_cuberoots[64]. So how is the second array generated?
Also, in this line:
memcpy(cube_hash, fractional_cubehash, 16);You are only copying 16 bytes, instead of 16 entries of __uint64. If your intent was to copy the entire array, you would need to use
memcpy(cube_hash, fractional_cubehash, 16 * sizeof (__uint64));I don't have time to delve into the actual hashing algorithm right now, but I might edit my answer to add more details when I have a look.
EDIT
Ok, there are actually a lot more problems that don't involve specifically hashing.
Your usage of the MOD function is incorrect given its definition. You often calculate things like
MOD(i + 1, digest_length) or MOD(digest[i], z) where the second argument will likely be larger than 64 (the digest_length is calculated as digest_length = FBA(message_length + 8, 128);, so for messages of more than 64 bytes). In this case the 1 = 64, then 1 = 32, sorry for the confusion here). There might also be similar issues with your use of the other ROT macros since the shifts could well be beyond the bit sizes used. And also, for example this line:
#define ROTL(i,n) ((i > (sizeof(__uint64) - n)))You use sizeof(__uint64) as a bit size while this returns bytes...
Also you are allocating the digest as
__uint64* digest = new __uint64[digest_length]();. Thus the digest length will in fact be 8 times larger than what you might expect from the digest_length variable (if you were intending to use that as byte length, otherwise ignore this comment).This is also inconsistent with your usage of
message_length, which contains the length of the message in bytes, while digest_length is counting in 8-byte chunks.There are lots of these types of confusions in the code, so I suggest resolving these inconsistencies before anything else since it is hard to understand what the hashing does if such details are unclear.
Code Snippets
/*
The 16 fractional parts of the cuberoots of the first 64 prime numbers.
*/
const __uint64 fractional_cubehash[16]{Context
StackExchange Code Review Q#156517, answer score: 7
Revisions (0)
No revisions yet.