patternsqlMinor
Coin flips and Dice rolls in a stored procedure
Viewed 0 times
coinstoredrollsprocedurediceandflips
Problem
I saw a post in chat a few days ago, of what I thought was an interesting (albeit fairly simple) statistical challenge:
By @skiwi:
If I shoot once with 50% chance of target A and 50% of target B, and if I hit target A I have 95% chance to shoot again, then how many of target A will I hit on average?
So I broke out the SQL machine and decided to write something like this, but more generic. So for the 50/50 chance I used a coin-flip-like logic, then for the subsequent rolls, I used the analogy of a dice roll, as I feel these two are clear and should be familiar to anyone.
I wrapped this into a stored procedure
I did as much as I could to make it set-based, but some parts, especially the repeated dice rolls, I couldn't think of a good way to do it without a loop.
Execution is still quite fast, 1-2 seconds for 10000 rows and about 5 seconds for 10000 rows. What can I improve on? Do my statistics make sense?
```
use PhrancisLocal;
go
if object_id('dbo.usp_CoinFlipAndDiceRolls') is not null
drop procedure dbo.usp_CoinFlipAndDiceRolls;
go
set ansi_nulls on;
go
set quoted_identifier on;
go
create procedure dbo.usp_CoinFlipAndDiceRolls
@numberOfRuns int = 1000,
@chanceToWinDiceRoll float = null,
@includeStatistics bit = 0
as
begin
set nocount on;
/ Default dice roll to 1 chance in 6 if not passed in to the procedure as a parameter: /
if @chanceToWinDiceRoll is null
set @chanceToWinDiceRoll = (1.0/6);
/ CONSTANTS /
declare
@CHANCE_TO_WIN_COIN_FLIP float = 0.5,
@WIN bit = 1,
@LOSS bit = 0;
/* Temporary table which will be used to hold the results of the operations
and aggregate statistics from the results to return to the caller. */
if object_id('tempdb..#CoinFlipAndDiceRolls') is not null
drop table #CoinFlipAndDiceRolls;
create table #CoinFlipAndD
By @skiwi:
If I shoot once with 50% chance of target A and 50% of target B, and if I hit target A I have 95% chance to shoot again, then how many of target A will I hit on average?
So I broke out the SQL machine and decided to write something like this, but more generic. So for the 50/50 chance I used a coin-flip-like logic, then for the subsequent rolls, I used the analogy of a dice roll, as I feel these two are clear and should be familiar to anyone.
I wrapped this into a stored procedure
usp_CoinFlipAndDiceRolls.sql (the usp_ or sp_ prefix is very commonly used in databases when naming procedures). I did as much as I could to make it set-based, but some parts, especially the repeated dice rolls, I couldn't think of a good way to do it without a loop.
Execution is still quite fast, 1-2 seconds for 10000 rows and about 5 seconds for 10000 rows. What can I improve on? Do my statistics make sense?
```
use PhrancisLocal;
go
if object_id('dbo.usp_CoinFlipAndDiceRolls') is not null
drop procedure dbo.usp_CoinFlipAndDiceRolls;
go
set ansi_nulls on;
go
set quoted_identifier on;
go
create procedure dbo.usp_CoinFlipAndDiceRolls
@numberOfRuns int = 1000,
@chanceToWinDiceRoll float = null,
@includeStatistics bit = 0
as
begin
set nocount on;
/ Default dice roll to 1 chance in 6 if not passed in to the procedure as a parameter: /
if @chanceToWinDiceRoll is null
set @chanceToWinDiceRoll = (1.0/6);
/ CONSTANTS /
declare
@CHANCE_TO_WIN_COIN_FLIP float = 0.5,
@WIN bit = 1,
@LOSS bit = 0;
/* Temporary table which will be used to hold the results of the operations
and aggregate statistics from the results to return to the caller. */
if object_id('tempdb..#CoinFlipAndDiceRolls') is not null
drop table #CoinFlipAndDiceRolls;
create table #CoinFlipAndD
Solution
Overall, I think your procedure is very good. It's consistent in layout, capitalisation and style, and well commented. I had no problems reading through it and understanding what it was doing (and how) without actually running it.
I really like the way you distinguish global constants with
Performance-related improvements
It obviously depends on the spec of the SQL box, but 5 seconds for 10,000 runs seems entirely reasonable to me, and I don't see a need to improve on this for the sake of it.
In the context of this problem, iterative looping is a perfectly reasonable approach; this doesn't lend itself to set-based processing and after a brief think, I can't see how I'd do it without loops. I'm sure it's possible but does it merit the thought-effort required? Not for me.
One way you might get better performance is by replacing the temporary table with a table variable. This is highly debateable and would depend on a lot of factors (which version of SQL Server, how many rows are a realistic load, amount of memory, etc etc). Rather than debate it academically, in this situation I'd create a copy of the proc using an
Temporary table scope
Another minor performance improvement (not necessarily measurable) could be to remove the statements dropping
Syntactic short-cuts / improvements
-
I quite like the
-
When you insert into
-
Where you declare
-
For that and some other variable assignments from a
which is a little terser than your subquery-style approach.
Clarity of loop control
In your dice-roll block, I think it'd be clearer to maintain
This way, you wouldn't need to initially assign it at all outside the loop.
I really like the way you distinguish global constants with
@ALL_CAPS, from regular "true" @variables. I'd not seen this before in T-SQL, but know it as a general programming style, and will be copying in my own SQL code from now on.Performance-related improvements
It obviously depends on the spec of the SQL box, but 5 seconds for 10,000 runs seems entirely reasonable to me, and I don't see a need to improve on this for the sake of it.
In the context of this problem, iterative looping is a perfectly reasonable approach; this doesn't lend itself to set-based processing and after a brief think, I can't see how I'd do it without loops. I'm sure it's possible but does it merit the thought-effort required? Not for me.
One way you might get better performance is by replacing the temporary table with a table variable. This is highly debateable and would depend on a lot of factors (which version of SQL Server, how many rows are a realistic load, amount of memory, etc etc). Rather than debate it academically, in this situation I'd create a copy of the proc using an
@table and compare their performance side-by-side. Temporary table scope
Another minor performance improvement (not necessarily measurable) could be to remove the statements dropping
#CoinFlipAndDiceRolls before creating it, and at the end. Temporary tables are scoped to the instance of the stored procedure anyway, so it can't already exist at the start of the proc (unless a parent callee stored procedure has created it - which you'd know about). Similarly, you don't have to clean it up at the end, SQL will do that for you.Syntactic short-cuts / improvements
-
I quite like the
+= operator (official documentation here), allowing set @row_num += 1 instead of set @row_num = @row_num + 1.-
When you insert into
#CoinFlipAndDiceRolls, in your particular case I would normally use values (val1,val2,val3) syntax rather than selecting the values, because these are a single row of values with no from clause.-
Where you declare
@currentRow - personally if I'm setting to a value dynamically (rather than just hardcoded) I don't like to do this in the declare. Rather, I declare it then assign in a subsequent statement.-
For that and some other variable assignments from a
select query, having declared the variable you can do this: select @currentRow = min(row_id)
from #CoinFlipAndDiceRolls
where NumberOfDiceRolls = 0
and CoinFlipWon = @WINwhich is a little terser than your subquery-style approach.
Clarity of loop control
In your dice-roll block, I think it'd be clearer to maintain
@currentRow at the start of the loop not the end. begin
set @diceRollSeed = rand();
select @currentRow=min(row_id)
from #CoinFlipAndDiceRolls
where NumberOfDiceRolls = 0
and CoinFlipWon = @WIN;
while @diceRollSeed <= @chanceToWinDiceRoll
begin
update #CoinFlipAndDiceRolls
set NumberOfDiceRolls = NumberOfDiceRolls + 1
where row_id = @currentRow;
/* Roll a new dice value: */
set @diceRollSeed = rand();
end
);
end;This way, you wouldn't need to initially assign it at all outside the loop.
Code Snippets
select @currentRow = min(row_id)
from #CoinFlipAndDiceRolls
where NumberOfDiceRolls = 0
and CoinFlipWon = @WINbegin
set @diceRollSeed = rand();
select @currentRow=min(row_id)
from #CoinFlipAndDiceRolls
where NumberOfDiceRolls = 0
and CoinFlipWon = @WIN;
while @diceRollSeed <= @chanceToWinDiceRoll
begin
update #CoinFlipAndDiceRolls
set NumberOfDiceRolls = NumberOfDiceRolls + 1
where row_id = @currentRow;
/* Roll a new dice value: */
set @diceRollSeed = rand();
end
);
end;Context
StackExchange Code Review Q#113988, answer score: 6
Revisions (0)
No revisions yet.