HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Expanding a dataset based on a weight field

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fieldexpandingweightdatasetbased

Problem

I have a table that contains records in an un-expanded form. Each record has an associated integer weight that essentially informs us how many times the record should be replicated in order to get the true population.

Say, I have 3 records in my table. sampn ids a unique record and weight is the frequency weight. The un-expanded dataset looks like this:

sampn   weight  attrib1 attrib2 attrib3
    1       2       23      32      65
    2       1       32      56      75
    3       3       54      25      87


Once expanded, the dataset will be like this (note - I removed the weight field - but this is not essential):

sampn   attrib1 attrib2 attrib3
    1       23      32      65
    1       23      32      65
    2       32      56      75
    3       54      25      87
    3       54      25      87
    3       54      25      87


I have tried to do this using cursors but it is taking a really long time to execute. Is there a clever way to do this really fast? Any predefined T-SQL stored procedure that achieves this?

Update

All,
Thanks for the answers! Really great learning experience! Performed the expansion operation on my dataset. Paul's auxiliary table of numbers had the best execution time.

Solution

An efficient way to perform this task uses an auxiliary table of numbers. This is simply a table with integers from 1 to n, where 'n' is perhaps a million or so. A numbers table comes in handy for all sorts of regular tasks.

CREATE TABLE dbo.Demo
(
    SampleNumber    integer IDENTITY NOT NULL,
    SampleWeight    integer NOT NULL,
    Attribute1      integer NOT NULL,
    Attribute2      integer NOT NULL,
    Attribute3      integer NOT NULL,

    CONSTRAINT [PK dbo.Demo SampleNumber]
        PRIMARY KEY (SampleNumber),

    CONSTRAINT [CK dbo.Demo SampleWeight 1-50]
        CHECK (SampleWeight BETWEEN 1 AND 50)
);

INSERT INTO dbo.Demo
    (
    SampleWeight,
    Attribute1, 
    Attribute2, 
    Attribute3
    )
VALUES
    (2, 23, 32, 65),
    (1, 32, 56, 75),
    (3, 54, 25, 87);

SELECT
    D.SampleNumber,
    D.Attribute1,
    D.Attribute2,
    D.Attribute3
FROM dbo.Demo AS D
JOIN dbo.Numbers AS N
    ON N.n BETWEEN 1 AND D.SampleWeight;

DROP TABLE dbo.Demo;


Execution plan:

Output:

╔══════════════╦════════════╦════════════╦════════════╗
║ SampleNumber ║ Attribute1 ║ Attribute2 ║ Attribute3 ║
╠══════════════╬════════════╬════════════╬════════════╣
║            1 ║         23 ║         32 ║         65 ║
║            1 ║         23 ║         32 ║         65 ║
║            2 ║         32 ║         56 ║         75 ║
║            3 ║         54 ║         25 ║         87 ║
║            3 ║         54 ║         25 ║         87 ║
║            3 ║         54 ║         25 ║         87 ║
╚══════════════╩════════════╩════════════╩════════════╝


SQLfiddle here.

Code Snippets

CREATE TABLE dbo.Demo
(
    SampleNumber    integer IDENTITY NOT NULL,
    SampleWeight    integer NOT NULL,
    Attribute1      integer NOT NULL,
    Attribute2      integer NOT NULL,
    Attribute3      integer NOT NULL,

    CONSTRAINT [PK dbo.Demo SampleNumber]
        PRIMARY KEY (SampleNumber),

    CONSTRAINT [CK dbo.Demo SampleWeight 1-50]
        CHECK (SampleWeight BETWEEN 1 AND 50)
);

INSERT INTO dbo.Demo
    (
    SampleWeight,
    Attribute1, 
    Attribute2, 
    Attribute3
    )
VALUES
    (2, 23, 32, 65),
    (1, 32, 56, 75),
    (3, 54, 25, 87);

SELECT
    D.SampleNumber,
    D.Attribute1,
    D.Attribute2,
    D.Attribute3
FROM dbo.Demo AS D
JOIN dbo.Numbers AS N
    ON N.n BETWEEN 1 AND D.SampleWeight;

DROP TABLE dbo.Demo;
╔══════════════╦════════════╦════════════╦════════════╗
║ SampleNumber ║ Attribute1 ║ Attribute2 ║ Attribute3 ║
╠══════════════╬════════════╬════════════╬════════════╣
║            1 ║         23 ║         32 ║         65 ║
║            1 ║         23 ║         32 ║         65 ║
║            2 ║         32 ║         56 ║         75 ║
║            3 ║         54 ║         25 ║         87 ║
║            3 ║         54 ║         25 ║         87 ║
║            3 ║         54 ║         25 ║         87 ║
╚══════════════╩════════════╩════════════╩════════════╝

Context

StackExchange Database Administrators Q#46957, answer score: 9

Revisions (0)

No revisions yet.