patternsqlCritical
Why are numbers tables "invaluable"?
Viewed 0 times
whytablesarenumbersinvaluable
Problem
Our resident database expert is telling us that numbers tables are invaluable. I don't quite understand why. Here's a numbers table:
Per the blog post, the rationale given is
Numbers tables are truly invaluable. I use them all of the time for string manipulation, simulating window functions, populating test tables with lots of data, eliminating cursor logic, and many other tasks that would be incredibly difficult without them.
But I don't understand what those uses are, exactly -- can you provide some compelling, specific examples of where a "numbers table" saves you a ton of work in SQL Server -- and why we should have them?
USE Model
GO
CREATE TABLE Numbers
(
Number INT NOT NULL,
CONSTRAINT PK_Numbers
PRIMARY KEY CLUSTERED (Number)
WITH FILLFACTOR = 100
)
INSERT INTO Numbers
SELECT
(a.Number * 256) + b.Number AS Number
FROM
(
SELECT number
FROM master..spt_values
WHERE
type = 'P'
AND number <= 255
) a (Number),
(
SELECT number
FROM master..spt_values
WHERE
type = 'P'
AND number <= 255
) b (Number)
GOPer the blog post, the rationale given is
Numbers tables are truly invaluable. I use them all of the time for string manipulation, simulating window functions, populating test tables with lots of data, eliminating cursor logic, and many other tasks that would be incredibly difficult without them.
But I don't understand what those uses are, exactly -- can you provide some compelling, specific examples of where a "numbers table" saves you a ton of work in SQL Server -- and why we should have them?
Solution
I've seen many uses when you need to project 'missing data'. Eg. you have a time series (an access log for instance) and you want to show the number of hits per day for past 30 days (think analytics dashboard). If you do a
Other examples I've seen used similar time series tricks (date/time +/- number) to do all sort of window calculations. In general, whenever in an imperative language you would use a for loop with a well known number of iterations, the declarative and set nature of SQL can use a trick based on a numbers table.
BTW, I feel the need to call out the fact that even though using a numbers table it feels like imperative procedural execution, don't fall into the fallacy of assuming it is imperative. Let me give an example:
This program will output 999999, that is pretty much guaranteed.
Lets try the same in SQL Server, using a number table. First create a table of 1,000,000 numbers:
Now lets do the 'for loop':
The result is:
If you're now having a WTF moment (after all
There are many trolls under this bridge and I mentioned some in On SQL Server boolean operator short-circuit and T-SQL functions do no imply a certain order of execution
select count(...) from ... group by day you will get the count for every day, but the result will only have a row for each day you actually had at least one access. On the other hand if you first project a table of days from your numbers table (select dateadd(day, -number, today) as day from numbers) and then you left join with the counts (or outer apply, whatever you fancy) then you will get a result that has 0 for count for the days you had no access. This is just one example. Of course, one may argue that the presentation layer of your dashboard could handle the missing days and just show a 0 instead, but some tools (eg. SSRS) will simply not be able to handle this.Other examples I've seen used similar time series tricks (date/time +/- number) to do all sort of window calculations. In general, whenever in an imperative language you would use a for loop with a well known number of iterations, the declarative and set nature of SQL can use a trick based on a numbers table.
BTW, I feel the need to call out the fact that even though using a numbers table it feels like imperative procedural execution, don't fall into the fallacy of assuming it is imperative. Let me give an example:
int x;
for (int i=0;i<1000000;++i)
x = i;
printf("%d",x);This program will output 999999, that is pretty much guaranteed.
Lets try the same in SQL Server, using a number table. First create a table of 1,000,000 numbers:
create table numbers (number int not null primary key);
go
declare @i int = 0
, @j int = 0;
set nocount on;
begin transaction
while @i < 1000
begin
set @j = 0;
while @j < 1000
begin
insert into numbers (number)
values (@j*1000+@i);
set @j += 1;
end
commit;
raiserror (N'Inserted %d*1000', 0, 0, @i)
begin transaction;
set @i += 1;
end
commit
goNow lets do the 'for loop':
declare @x int;
select @x = number
from numbers with(nolock);
select @x as [@x];The result is:
@x
-----------
88698If you're now having a WTF moment (after all
number is the clustered primary key!), the trick is called allocation order scan and I did not insert @j*1000+@i by accident... You could had also venture a guess and say the result is because parallelism and that sometimes may be the correct answer.There are many trolls under this bridge and I mentioned some in On SQL Server boolean operator short-circuit and T-SQL functions do no imply a certain order of execution
Code Snippets
int x;
for (int i=0;i<1000000;++i)
x = i;
printf("%d",x);create table numbers (number int not null primary key);
go
declare @i int = 0
, @j int = 0;
set nocount on;
begin transaction
while @i < 1000
begin
set @j = 0;
while @j < 1000
begin
insert into numbers (number)
values (@j*1000+@i);
set @j += 1;
end
commit;
raiserror (N'Inserted %d*1000', 0, 0, @i)
begin transaction;
set @i += 1;
end
commit
godeclare @x int;
select @x = number
from numbers with(nolock);
select @x as [@x];@x
-----------
88698Context
StackExchange Database Administrators Q#11506, answer score: 89
Revisions (0)
No revisions yet.