patternMinor
Extracting Randomness from Mouse Acceleration
Viewed 0 times
accelerationrandomnessextractingfrommouse
Problem
I'm working on trying to make an "entropy pool" that will be fed as input into an RNG (as in, ex, Fortuna). In order to do so, I need to take various collected data and extract as much entropy as possible to turn it into unpredictable bits. I've read many papers about extracting randomness, but I haven't understood how to apply it to real-world data sources.
For example, as one entropy source, I'd like to use the position of the mouse cursor. After some simple experimentation, it seems like using the acceleration of the cursor (resulting in two streams of X acceleration and Y acceleration) is the best representation to use in that it reduces the sample range significantly. Running several min-entropy estimators on the acceleration data (separately for X and Y axis data), I get estimates of at worst 0.75 bits/sample of entropy (ignoring that the two streams are correlated). Beyond that, I'm not sure how to apply any published randomness extractor to this data.
Part of my problem stems from the fact that most entropy extractors talk about entropy rate, which I'm not sure how to calculate in this case. My samples are 8 bits each (signed byte), so that could make the entropy rate 0.75/8 ~= 0.09, which is far too low for most extractors. Another problem is that I have two correlated streams, and I can't remember any extractors that deal with that kind of input.
Any guidance or references on how to extract entropy from this data using any published randomness extractor would be greatly appreciated. I'm very interested in using a "real" randomness extractor and not the common "just hash it" approach.
I have a custom design that I can detail if desired. It essentially computes symbol statistics from the entire test capture (5 million samples) and uses range encoding to encode each symbol using those statistics. It results in about 1.5 bits/sample of output for each stream, and I'm not sure how well founded the idea is.
Edit: One of my end goals is to have a very large
For example, as one entropy source, I'd like to use the position of the mouse cursor. After some simple experimentation, it seems like using the acceleration of the cursor (resulting in two streams of X acceleration and Y acceleration) is the best representation to use in that it reduces the sample range significantly. Running several min-entropy estimators on the acceleration data (separately for X and Y axis data), I get estimates of at worst 0.75 bits/sample of entropy (ignoring that the two streams are correlated). Beyond that, I'm not sure how to apply any published randomness extractor to this data.
Part of my problem stems from the fact that most entropy extractors talk about entropy rate, which I'm not sure how to calculate in this case. My samples are 8 bits each (signed byte), so that could make the entropy rate 0.75/8 ~= 0.09, which is far too low for most extractors. Another problem is that I have two correlated streams, and I can't remember any extractors that deal with that kind of input.
Any guidance or references on how to extract entropy from this data using any published randomness extractor would be greatly appreciated. I'm very interested in using a "real" randomness extractor and not the common "just hash it" approach.
I have a custom design that I can detail if desired. It essentially computes symbol statistics from the entire test capture (5 million samples) and uses range encoding to encode each symbol using those statistics. It results in about 1.5 bits/sample of output for each stream, and I'm not sure how well founded the idea is.
Edit: One of my end goals is to have a very large
Solution
Start by reading How to extract randomness from a file?.
At a broad level, it seems like you understand pretty well the techniques.
To extract randomness in practice, I recommend you use a cryptographic hash, and use techniques from the cryptographic literature for cryptographic building pseudorandom generators.
You mention "extractors". I know there's some cool theory surrounding them, but I don't recommend you use extractors. They are beautiful but not practical. Cryptographic hash functions are superior in practice.
You talked about estimating the amount of entropy in the data. Unfortunately entropy estimation is an inexact science. It's also easy to over-estimate the amount of entropy (e.g., if there are non-trivial correlations in the data that you didn't think to check for). So, a typical procedure that cryptographers often use is to make some estimate of the entropy rate (typically through a priori calculations), then divide by a factor of 4 or so to provide a safety margin -- and gather random values from many sources, and feed them all into the pool. The hope is that even if one source provides low-entropy data, or even if one of the entropy estimates is too optimistic, maybe some other source will have enough entropy in it to make up for the problem.
Overall, I recommend you spend some quality time in the cryptographic literature reading about the techniques that have been developed there. Cryptographers have spent a lot of time worrying and thinking about this, because cryptography relies upon high-quality random numbers. And, rather than re-inventing the wheel yourself, it's probably better to re-use some existing well-tested cryptographic method.
Here are a few pointers to get you started:
You can look around Crypto.SE and the resources mentioned there, and you'll probably be able to pick up a bunch about this topic. Have fun!
At a broad level, it seems like you understand pretty well the techniques.
To extract randomness in practice, I recommend you use a cryptographic hash, and use techniques from the cryptographic literature for cryptographic building pseudorandom generators.
You mention "extractors". I know there's some cool theory surrounding them, but I don't recommend you use extractors. They are beautiful but not practical. Cryptographic hash functions are superior in practice.
You talked about estimating the amount of entropy in the data. Unfortunately entropy estimation is an inexact science. It's also easy to over-estimate the amount of entropy (e.g., if there are non-trivial correlations in the data that you didn't think to check for). So, a typical procedure that cryptographers often use is to make some estimate of the entropy rate (typically through a priori calculations), then divide by a factor of 4 or so to provide a safety margin -- and gather random values from many sources, and feed them all into the pool. The hope is that even if one source provides low-entropy data, or even if one of the entropy estimates is too optimistic, maybe some other source will have enough entropy in it to make up for the problem.
Overall, I recommend you spend some quality time in the cryptographic literature reading about the techniques that have been developed there. Cryptographers have spent a lot of time worrying and thinking about this, because cryptography relies upon high-quality random numbers. And, rather than re-inventing the wheel yourself, it's probably better to re-use some existing well-tested cryptographic method.
Here are a few pointers to get you started:
- https://crypto.stackexchange.com/q/41967/351
- https://crypto.stackexchange.com/q/27028/351
- https://crypto.stackexchange.com/q/32984/351
- https://crypto.stackexchange.com/q/10517/351
- https://crypto.stackexchange.com/q/18355/351
- https://crypto.stackexchange.com/q/39186/351
You can look around Crypto.SE and the resources mentioned there, and you'll probably be able to pick up a bunch about this topic. Have fun!
Context
StackExchange Computer Science Q#67681, answer score: 3
Revisions (0)
No revisions yet.