HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlCritical

Why are numbers tables "invaluable"?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whytablesarenumbersinvaluable

Problem

Our resident database expert is telling us that numbers tables are invaluable. I don't quite understand why. Here's a numbers table:

USE Model
GO

CREATE TABLE Numbers
(
    Number INT NOT NULL,
    CONSTRAINT PK_Numbers 
        PRIMARY KEY CLUSTERED (Number)
        WITH FILLFACTOR = 100
)

INSERT INTO Numbers
SELECT
    (a.Number * 256) + b.Number AS Number
FROM 
    (
        SELECT number
        FROM master..spt_values
        WHERE 
            type = 'P'
            AND number <= 255
    ) a (Number),
    (
        SELECT number
        FROM master..spt_values
        WHERE 
            type = 'P'
            AND number <= 255
    ) b (Number)
GO


Per the blog post, the rationale given is


Numbers tables are truly invaluable. I use them all of the time for string manipulation, simulating window functions, populating test tables with lots of data, eliminating cursor logic, and many other tasks that would be incredibly difficult without them.

But I don't understand what those uses are, exactly -- can you provide some compelling, specific examples of where a "numbers table" saves you a ton of work in SQL Server -- and why we should have them?

Solution

I've seen many uses when you need to project 'missing data'. Eg. you have a time series (an access log for instance) and you want to show the number of hits per day for past 30 days (think analytics dashboard). If you do a select count(...) from ... group by day you will get the count for every day, but the result will only have a row for each day you actually had at least one access. On the other hand if you first project a table of days from your numbers table (select dateadd(day, -number, today) as day from numbers) and then you left join with the counts (or outer apply, whatever you fancy) then you will get a result that has 0 for count for the days you had no access. This is just one example. Of course, one may argue that the presentation layer of your dashboard could handle the missing days and just show a 0 instead, but some tools (eg. SSRS) will simply not be able to handle this.

Other examples I've seen used similar time series tricks (date/time +/- number) to do all sort of window calculations. In general, whenever in an imperative language you would use a for loop with a well known number of iterations, the declarative and set nature of SQL can use a trick based on a numbers table.

BTW, I feel the need to call out the fact that even though using a numbers table it feels like imperative procedural execution, don't fall into the fallacy of assuming it is imperative. Let me give an example:

int x;
for (int i=0;i<1000000;++i)
  x = i;
printf("%d",x);


This program will output 999999, that is pretty much guaranteed.

Lets try the same in SQL Server, using a number table. First create a table of 1,000,000 numbers:

create table numbers (number int not null primary key);
go

declare @i int = 0
    , @j int = 0;

set nocount on;
begin transaction
while @i < 1000
begin
    set @j = 0;
    while @j < 1000
    begin
        insert into numbers (number) 
            values (@j*1000+@i);
        set @j += 1;
    end
    commit;
    raiserror (N'Inserted %d*1000', 0, 0, @i)
    begin transaction;
    set @i += 1;
end
commit
go


Now lets do the 'for loop':

declare @x int;
select @x = number 
from numbers with(nolock);
select @x as [@x];


The result is:

@x
-----------
88698


If you're now having a WTF moment (after all number is the clustered primary key!), the trick is called allocation order scan and I did not insert @j*1000+@i by accident... You could had also venture a guess and say the result is because parallelism and that sometimes may be the correct answer.

There are many trolls under this bridge and I mentioned some in On SQL Server boolean operator short-circuit and T-SQL functions do no imply a certain order of execution

Code Snippets

int x;
for (int i=0;i<1000000;++i)
  x = i;
printf("%d",x);
create table numbers (number int not null primary key);
go

declare @i int = 0
    , @j int = 0;

set nocount on;
begin transaction
while @i < 1000
begin
    set @j = 0;
    while @j < 1000
    begin
        insert into numbers (number) 
            values (@j*1000+@i);
        set @j += 1;
    end
    commit;
    raiserror (N'Inserted %d*1000', 0, 0, @i)
    begin transaction;
    set @i += 1;
end
commit
go
declare @x int;
select @x = number 
from numbers with(nolock);
select @x as [@x];
@x
-----------
88698

Context

StackExchange Database Administrators Q#11506, answer score: 89

Revisions (0)

No revisions yet.