patterncMinor
Denoise an image under extreme time pressure
Viewed 0 times
imagedenoisepressuretimeunderextreme
Problem
I'm working on a real-time, embedded system image processing application for my group engineering capstone in undergrad. I'm receiving data at 60FPS, and have to isolate and detect the location of a flying object in each frame, if it exists, before the next frame. This gives me about 15ms to perform the entire image processing algorithm.
One important step in the process is denoising the image. The input to the denoising function is an \$MxN\$ image, obtained by background subtraction/frame differencing. Each pixel is represented by a single bit. (Basically we take the frame at time \$t+1\$, subtract it from the frame at time \$t\$, and if the absolute difference is above some threshold, we set the bit equal to 1.) This means that we store the entire image in \$\frac{M*N}{8}\$ bytes.
Owing to a quirk of how the background subtraction algorithm was implemented, the LSB of each byte contains the earliest pixel sent by the camera, and the MSB contains the latest pixel sent by the camera. So our input data is ordered something like this in our array:
My denoising function performs two operations:
The Problem
It's too slow, by an order of magnitude. This runs in about 109ms on my hardware for a given image (320x240 in my case). It should be running around 10-12ms.
Is the slowness I'm experiencing due to the nature of the job I'm trying to do, or to my implementation? How can I speed it up?
```
// bitBuffer is a full image
void Denoise(uint8_t* unsafe bitBuffer)
{
for (int i = 2; i > 1;
botBit = bot & 0x1;
bot = bot >> 1;
// Once we arrive here, the leftByte has already been flipped. So now we have this orientation of bytes left/current/right: 0 1 2 3 4 5 6 7 ||| 15 14 13 12 11
One important step in the process is denoising the image. The input to the denoising function is an \$MxN\$ image, obtained by background subtraction/frame differencing. Each pixel is represented by a single bit. (Basically we take the frame at time \$t+1\$, subtract it from the frame at time \$t\$, and if the absolute difference is above some threshold, we set the bit equal to 1.) This means that we store the entire image in \$\frac{M*N}{8}\$ bytes.
Owing to a quirk of how the background subtraction algorithm was implemented, the LSB of each byte contains the earliest pixel sent by the camera, and the MSB contains the latest pixel sent by the camera. So our input data is ordered something like this in our array:
7 6 5 4 3 2 1 0 | 15 14 13 12 11 10 9 8 | 23 22 21 20 19 18 17 16 | 31 ...My denoising function performs two operations:
- Sets a pixel to 0 if there are less than 3 pixels next to it with a 1.
- Flips the orientation of data to simplify further processing, so that it looks like
0 1 2 3 4 5 6 7 | 8 9 ...
The Problem
It's too slow, by an order of magnitude. This runs in about 109ms on my hardware for a given image (320x240 in my case). It should be running around 10-12ms.
Is the slowness I'm experiencing due to the nature of the job I'm trying to do, or to my implementation? How can I speed it up?
```
// bitBuffer is a full image
void Denoise(uint8_t* unsafe bitBuffer)
{
for (int i = 2; i > 1;
botBit = bot & 0x1;
bot = bot >> 1;
// Once we arrive here, the leftByte has already been flipped. So now we have this orientation of bytes left/current/right: 0 1 2 3 4 5 6 7 ||| 15 14 13 12 11
Solution
One common way to speed up operations that have as much bit twiddling as you are doing is to examine more than one bit at once. This is commonly done using lookup tables. So, if you were going to say examime a nibble at a time, those 4 bits would have 16 possibilities. If you include the adjacent bits on either end it becomes 64 possibilities.
These 64 possibilities could be pre-computed as to their outcomes. You could then resolve 4 bits with one lookup and not much RAM would be required. The more RAM you have available, the bigger the table you can build, and the more bits you could resolve at one go.
These 64 possibilities could be pre-computed as to their outcomes. You could then resolve 4 bits with one lookup and not much RAM would be required. The more RAM you have available, the bigger the table you can build, and the more bits you could resolve at one go.
Context
StackExchange Code Review Q#156954, answer score: 3
Revisions (0)
No revisions yet.