patternsqlMinor
Expanding a dataset based on a weight field
Viewed 0 times
fieldexpandingweightdatasetbased
Problem
I have a table that contains records in an un-expanded form. Each record has an associated integer weight that essentially informs us how many times the record should be replicated in order to get the true population.
Say, I have 3 records in my table.
Once expanded, the dataset will be like this (note - I removed the
I have tried to do this using cursors but it is taking a really long time to execute. Is there a clever way to do this really fast? Any predefined T-SQL stored procedure that achieves this?
Update
All,
Thanks for the answers! Really great learning experience! Performed the expansion operation on my dataset. Paul's auxiliary table of numbers had the best execution time.
Say, I have 3 records in my table.
sampn ids a unique record and weight is the frequency weight. The un-expanded dataset looks like this:sampn weight attrib1 attrib2 attrib3
1 2 23 32 65
2 1 32 56 75
3 3 54 25 87Once expanded, the dataset will be like this (note - I removed the
weight field - but this is not essential):sampn attrib1 attrib2 attrib3
1 23 32 65
1 23 32 65
2 32 56 75
3 54 25 87
3 54 25 87
3 54 25 87I have tried to do this using cursors but it is taking a really long time to execute. Is there a clever way to do this really fast? Any predefined T-SQL stored procedure that achieves this?
Update
All,
Thanks for the answers! Really great learning experience! Performed the expansion operation on my dataset. Paul's auxiliary table of numbers had the best execution time.
Solution
An efficient way to perform this task uses an auxiliary table of numbers. This is simply a table with integers from 1 to n, where 'n' is perhaps a million or so. A numbers table comes in handy for all sorts of regular tasks.
Execution plan:
Output:
SQLfiddle here.
CREATE TABLE dbo.Demo
(
SampleNumber integer IDENTITY NOT NULL,
SampleWeight integer NOT NULL,
Attribute1 integer NOT NULL,
Attribute2 integer NOT NULL,
Attribute3 integer NOT NULL,
CONSTRAINT [PK dbo.Demo SampleNumber]
PRIMARY KEY (SampleNumber),
CONSTRAINT [CK dbo.Demo SampleWeight 1-50]
CHECK (SampleWeight BETWEEN 1 AND 50)
);
INSERT INTO dbo.Demo
(
SampleWeight,
Attribute1,
Attribute2,
Attribute3
)
VALUES
(2, 23, 32, 65),
(1, 32, 56, 75),
(3, 54, 25, 87);
SELECT
D.SampleNumber,
D.Attribute1,
D.Attribute2,
D.Attribute3
FROM dbo.Demo AS D
JOIN dbo.Numbers AS N
ON N.n BETWEEN 1 AND D.SampleWeight;
DROP TABLE dbo.Demo;Execution plan:
Output:
╔══════════════╦════════════╦════════════╦════════════╗
║ SampleNumber ║ Attribute1 ║ Attribute2 ║ Attribute3 ║
╠══════════════╬════════════╬════════════╬════════════╣
║ 1 ║ 23 ║ 32 ║ 65 ║
║ 1 ║ 23 ║ 32 ║ 65 ║
║ 2 ║ 32 ║ 56 ║ 75 ║
║ 3 ║ 54 ║ 25 ║ 87 ║
║ 3 ║ 54 ║ 25 ║ 87 ║
║ 3 ║ 54 ║ 25 ║ 87 ║
╚══════════════╩════════════╩════════════╩════════════╝SQLfiddle here.
Code Snippets
CREATE TABLE dbo.Demo
(
SampleNumber integer IDENTITY NOT NULL,
SampleWeight integer NOT NULL,
Attribute1 integer NOT NULL,
Attribute2 integer NOT NULL,
Attribute3 integer NOT NULL,
CONSTRAINT [PK dbo.Demo SampleNumber]
PRIMARY KEY (SampleNumber),
CONSTRAINT [CK dbo.Demo SampleWeight 1-50]
CHECK (SampleWeight BETWEEN 1 AND 50)
);
INSERT INTO dbo.Demo
(
SampleWeight,
Attribute1,
Attribute2,
Attribute3
)
VALUES
(2, 23, 32, 65),
(1, 32, 56, 75),
(3, 54, 25, 87);
SELECT
D.SampleNumber,
D.Attribute1,
D.Attribute2,
D.Attribute3
FROM dbo.Demo AS D
JOIN dbo.Numbers AS N
ON N.n BETWEEN 1 AND D.SampleWeight;
DROP TABLE dbo.Demo;╔══════════════╦════════════╦════════════╦════════════╗
║ SampleNumber ║ Attribute1 ║ Attribute2 ║ Attribute3 ║
╠══════════════╬════════════╬════════════╬════════════╣
║ 1 ║ 23 ║ 32 ║ 65 ║
║ 1 ║ 23 ║ 32 ║ 65 ║
║ 2 ║ 32 ║ 56 ║ 75 ║
║ 3 ║ 54 ║ 25 ║ 87 ║
║ 3 ║ 54 ║ 25 ║ 87 ║
║ 3 ║ 54 ║ 25 ║ 87 ║
╚══════════════╩════════════╩════════════╩════════════╝Context
StackExchange Database Administrators Q#46957, answer score: 9
Revisions (0)
No revisions yet.