snippetsqlMinor
How can I get satisfactory random numbers from the RAND function (or elsewhere)?
Viewed 0 times
randomcanthenumbersfunctionelsewhererandsatisfactorygethow
Problem
I'm creating a pseudo-random data set for application users to train on.
I'm surprised that if I seed the RAND() function with 1, 2, 3, etc., I get very nearly the same result from the seeded function. However, this seems to be followed by "properly random" but repeatable values when a seed is not supplied.
At first sight, it appears that I can evaluate RAND(@seed) and discard the result, then evaluate RAND() to get several really "random" numbers for my training data - so far I've planned to use four per record; I might need some more.
Is that plan going to work properly? And, what am I looking at, here? And, should it be in the documentation? I haven't found it.
Documentation says this, which may be meant as a clue:
The RAND function is a pseudorandom number generator that operates in a manner similar to the C run-time library rand function. If no seed is provided, the system generates its own variable seed numbers.
Does the rand function in C produce similar output for similar seed input?
I think the documentation could also state more clearly that RAND(@number) followed by RAND() always generates the same numbers. But that is what I wanted and what any experienced computer programmer will expect.
I suppose I could fill a table with random data keys obtained from https://www.random.org/
to use for the purpose - but that has drawbacks.
Update, provisional conclusion
I have the following conclusions about RAND() and for now I think I'll continue with it, but keeping alternatives in mind.
RAND(@int) sets the seed of the random number generator using the integer value given, and returns a float result which is not statistically independen
I'm surprised that if I seed the RAND() function with 1, 2, 3, etc., I get very nearly the same result from the seeded function. However, this seems to be followed by "properly random" but repeatable values when a seed is not supplied.
SELECT RAND(1) AS R1A, RAND() AS R1B, RAND(2) AS R2A, RAND() AS R2B,
RAND(3) AS R3A, RAND() AS R3B, RAND(4) AS R4A, RAND() AS R4B
0.713591993212924
0.472241415009636
0.713610626184182
0.217821139260039
0.71362925915544
0.963400850719992
0.713647892126698
0.708980575436056At first sight, it appears that I can evaluate RAND(@seed) and discard the result, then evaluate RAND() to get several really "random" numbers for my training data - so far I've planned to use four per record; I might need some more.
Is that plan going to work properly? And, what am I looking at, here? And, should it be in the documentation? I haven't found it.
Documentation says this, which may be meant as a clue:
The RAND function is a pseudorandom number generator that operates in a manner similar to the C run-time library rand function. If no seed is provided, the system generates its own variable seed numbers.
Does the rand function in C produce similar output for similar seed input?
I think the documentation could also state more clearly that RAND(@number) followed by RAND() always generates the same numbers. But that is what I wanted and what any experienced computer programmer will expect.
I suppose I could fill a table with random data keys obtained from https://www.random.org/
to use for the purpose - but that has drawbacks.
Update, provisional conclusion
I have the following conclusions about RAND() and for now I think I'll continue with it, but keeping alternatives in mind.
RAND(@int) sets the seed of the random number generator using the integer value given, and returns a float result which is not statistically independen
Solution
The only reason to ever provide a "seed" for
Just call it without a seed, and that should be good enough for most purposes.
If you do need a predictable sequence, and you're simply unhappy with the starting (random) value,
Finally, if you were on SQL 2008 or better, you'd also have the option of considering CRYPT_GEN_RANDOM instead (warning, it returns a varbinary, not a float, so you'd likely need to change your code a bit): see this article for some differences and discussion.
RAND() is if you want the exact same sequence of (pseudo) random values for testing purposes.Just call it without a seed, and that should be good enough for most purposes.
If you do need a predictable sequence, and you're simply unhappy with the starting (random) value,
RAND() takes seeds from -2,147,483,648 all the way up to 2,147,483,647 (full range of int), so try some seeds that differ by more than 1 or 2.Finally, if you were on SQL 2008 or better, you'd also have the option of considering CRYPT_GEN_RANDOM instead (warning, it returns a varbinary, not a float, so you'd likely need to change your code a bit): see this article for some differences and discussion.
Context
StackExchange Database Administrators Q#175911, answer score: 5
Revisions (0)
No revisions yet.