patterncsharpModerate
SSE instruction to check if byte array is zeroes C#
Viewed 0 times
ssebytearrayinstructionzeroescheck
Problem
My fundamental problem is how to check whether
Project references Mono.Simd assembly. I checked the code for following issues so far:
Is this code correct?
byte[] is full of zeroes. I posted a range of implementations (with timings) and one clearly beats others. In fact, it is so fast that I am having trouble believing it even works. I suggest taking a look at SO question page first. This is the winner code:static unsafe bool IsAllZeros(byte[] data)
{
fixed (byte* bytes = data) {
int len = data.Length;
int rem = len % (16*16);
Vector16b* b = (Vector16b*)bytes;
Vector16b* e = b + (len - rem) / (16*16);
Vector16b zero = Vector16b.Zero;
while (b+15 < e) {
if (*(b)+*(b+1)+*(b+2)+*(b+3)+*(b+4)+*(b+5)+*(b+6)+*(b+7)+*(b+8)+
*(b+9)+*(b+10)+*(b+11)+*(b+12)+*(b+13)+*(b+14)+*(b+15) != zero)
return false;
b += 16;
}
for (int i = 0; i < rem; i++)
if (data [len - 1 - i] != 0)
return false;
return true;
}
}Project references Mono.Simd assembly. I checked the code for following issues so far:
- Pointer arithmetic. Adding 1 to pointer moves it 16 bytes ahead. However loop is unrolled 16 times so iteration processes 256 bytes at a time.
- Array length is in bytes while
b eare in 16 byte vectors.
- Loop condition checks if last element
b+15is within the array. End pointerepoints to just outside of array.
- Loop processes 16 vectors so it moves pointer by 16 elements.
- Tail is processed for up to 15 vectors, each 16 bytes, one byte at a time.
Is this code correct?
Solution
This code is broken
When unrolling you add all the
From github.com/mono:
So the vectors are added index to index, and then truncated back to
You need to ensure that each individual
When unrolling you add all the
Vector16b.From github.com/mono:
[Acceleration (AccelMode.SSE2)]
public static unsafe Vector16b operator + (Vector16b va, Vector16b vb)
{
Vector16b res = new Vector16b ();
byte *a = &va.v0;
byte *b = &vb.v0;
byte *c = &res.v0;
for (int i = 0; i < 16; ++i)
*c++ = (byte)(*a++ + *b++);
return res;
}So the vectors are added index to index, and then truncated back to
byte. A sequence of adds can then end up as 0, fx:16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 = 256256 in binary is 0b100000000 and when you truncate that to byte you get 0b00000000.You need to ensure that each individual
Vector16b is equal to zero or use | instead of +.Code Snippets
[Acceleration (AccelMode.SSE2)]
public static unsafe Vector16b operator + (Vector16b va, Vector16b vb)
{
Vector16b res = new Vector16b ();
byte *a = &va.v0;
byte *b = &vb.v0;
byte *c = &res.v0;
for (int i = 0; i < 16; ++i)
*c++ = (byte)(*a++ + *b++);
return res;
}16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 + 16 = 256Context
StackExchange Code Review Q#108470, answer score: 14
Revisions (0)
No revisions yet.