patternMinor
Choosing a random bit from a bitmap
Viewed 0 times
randombitbitmapfromchoosing
Problem
Since, I don't have strong algorithmic background my question may sound a litlle odd. Please correct me, if so.
I have quite a large bitmap (~100 Million bits) (e.g.
Now, I need to choose randomly any
I have quite a large bitmap (~100 Million bits) (e.g.
100100101001010001001...010010). The bitmap is just an example, it doesn't have to start with 1 or end with 0. Now, I need to choose randomly any
0-bit from the bitmap and retrieve it's position. Memory is not an issue in my case, I can maintain a few copies of the bitmap, if necessary. But the acceptable complexity of the algorithm would be linear or O(n lnn) in the worst case. Couldn't you advice a direction to move to? I'll appreciate any advices or refernces to some related resources.Solution
There is a simple $O(n)$ algorithm using the technique of reservoir sampling. Keep a currently selected element $x$ (initially, none). Go over all bits in the file in order. When seeing the $m$th zero, put it in $x$ with probability $1/m$. You can show (exercise) that the final contents of $x$ is a uniformly random zero from the file.
If you are allowed preprocessing, you can store the locations of all zeroes in a new file, and then choose a uniformly random zero in $O(1)$ by choosing a uniformly random position in the list.
Practically speaking, we expect bitmaps to contain a decent fraction of zeroes – say at least 1%. This suggests querying uniformly random locations in the file until you reach a zero. The expected runtime of this algorithm is at most 100 random queries.
If you are allowed preprocessing, you can store the locations of all zeroes in a new file, and then choose a uniformly random zero in $O(1)$ by choosing a uniformly random position in the list.
Practically speaking, we expect bitmaps to contain a decent fraction of zeroes – say at least 1%. This suggests querying uniformly random locations in the file until you reach a zero. The expected runtime of this algorithm is at most 100 random queries.
Context
StackExchange Computer Science Q#48848, answer score: 9
Revisions (0)
No revisions yet.