HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Frequency distributions segmented by a field

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
frequencyfielddistributionssegmented

Problem

Some background:
I have a sample population data file. Each record in the data file has a frequency weight (FIELD NAME: wgt) indicating how many times the records needs to be replicated to get the true population. The data is set up in Microsoft SQL 2008 R2. It is usually a good practice to generate frequency distribution of the weights to see if one particular type of record is being over-represented in any way - thus help identify trends/ outliers. This task is straightforward in SQL:

SELECT wgt, COUNT(*) FROM tablename 
GROUP BY wgt


Challenge:
I want to further segregate these frequencies based on other values. Say a frequency distribution of weights for different household sizes. One way to achieve this is to use different where conditions in the statement above:

SELECT wgt, COUNT(*) FROM tablename 
WHERE household_size=x --x being the desired segment
GROUP BY wgt


But is there any way to create a single table with all of the different segments? Something like this:

WGT |   SIZE1       SIZE2       SIZE3       SIZE4 
--------------------------------------------------
1   |    2,034      1,025       502         234 
2   |    215        253         142         23 
3   |    31         25          21          34 
4   |    7          1           3           7 
5   |    5          NULL        2           5 
6   |    1          1           NULL        NULL 
7   |    NULL       1           NULL        NULL


A fancier solution I am looking for: A stored procedure where I can just specify the variable (say household size or household income) and the code should be able to generate separate frequency distributions for all distinct values for that variable as a view/table.

Any thoughts folks?

Solution

For the first challenge, are you looking for something like this?

SELECT wgt, SUM(Case when household_size=1 then 1 else 0 end) AS SIZE1
    , SUM(Case when household_size=2 then 1 else 0 end) AS SIZE2
    , SUM(Case when household_size=3 then 1 else 0 end) AS SIZE3
    , SUM(Case when household_size=4 then 1 else 0 end) AS SIZE4
FROM tablename
GROUP BY wgt


Now for the fancier challenge, where @column will be your sp parameter:

DECLARE @column VARCHAR(100);
SET @column = 'household_size';
DECLARE @main_SQL VARCHAR(4000);

SET @main_SQL =
    'DECLARE dynamic_cursor CURSOR FOR 
    SELECT DISTINCT ' + @column + '
    FROM tablename 
    ORDER BY ' + @column + ' 

    --Now we will be using a cursor to iterate over 
    DECLARE @value INT; 
    OPEN dynamic_cursor 
    FETCH NEXT FROM dynamic_cursor INTO @value 

    DECLARE @sql VARCHAR(2000); 
    SET @sql = ''SELECT wgt''; 
    WHILE @@FETCH_STATUS = 0 
    BEGIN 
        SET @sql = @sql + '', SUM(Case when ' + @column + '='' + CAST(@value AS VARCHAR) + '' then 1 else 0 end) AS SIZE'' + CAST(@value AS VARCHAR) 
        FETCH NEXT FROM dynamic_cursor INTO @value 
    END 
    SET @sql = @sql + '' FROM tablename GROUP BY wgt'' 

    close dynamic_cursor 
    DEALLOCATE dynamic_cursor 

    EXEC(@sql) '

EXEC(@main_SQL)


This part selectes the distinct values of the tablename's column received into a cursor

DECLARE dynamic_cursor CURSOR FOR 
SELECT DISTINCT ' + @column + '
FROM tablename 
ORDER BY ' + @column + '


Next we will iterate over the distinct values to dynamically build up a query that will look just like the one at the top of my post, so here is the basis for it:

start the select:

SET @sql = ''SELECT wgt''


the iterative part which dynamically adds the successive , SUM(Case when household_size=n then 1 else 0 end) AS SIZEn to the select (note that @value will be the distinct values that we got selected into the cursor):

SET @sql = @sql + '', SUM(Case when ' + @column + '='' + CAST(@value AS VARCHAR) + '' then 1 else 0 end) AS SIZE'' + CAST(@value AS VARCHAR)


finish the select:

SET @sql = @sql + '' FROM tablename GROUP BY wgt''

Code Snippets

SELECT wgt, SUM(Case when household_size=1 then 1 else 0 end) AS SIZE1
    , SUM(Case when household_size=2 then 1 else 0 end) AS SIZE2
    , SUM(Case when household_size=3 then 1 else 0 end) AS SIZE3
    , SUM(Case when household_size=4 then 1 else 0 end) AS SIZE4
FROM tablename
GROUP BY wgt
DECLARE @column VARCHAR(100);
SET @column = 'household_size';
DECLARE @main_SQL VARCHAR(4000);

SET @main_SQL =
    'DECLARE dynamic_cursor CURSOR FOR 
    SELECT DISTINCT ' + @column + '
    FROM tablename 
    ORDER BY ' + @column + ' 

    --Now we will be using a cursor to iterate over 
    DECLARE @value INT; 
    OPEN dynamic_cursor 
    FETCH NEXT FROM dynamic_cursor INTO @value 

    DECLARE @sql VARCHAR(2000); 
    SET @sql = ''SELECT wgt''; 
    WHILE @@FETCH_STATUS = 0 
    BEGIN 
        SET @sql = @sql + '', SUM(Case when ' + @column + '='' + CAST(@value AS VARCHAR) + '' then 1 else 0 end) AS SIZE'' + CAST(@value AS VARCHAR) 
        FETCH NEXT FROM dynamic_cursor INTO @value 
    END 
    SET @sql = @sql + '' FROM tablename GROUP BY wgt'' 

    close dynamic_cursor 
    DEALLOCATE dynamic_cursor 

    EXEC(@sql) '

EXEC(@main_SQL)
DECLARE dynamic_cursor CURSOR FOR 
SELECT DISTINCT ' + @column + '
FROM tablename 
ORDER BY ' + @column + '
SET @sql = ''SELECT wgt''
SET @sql = @sql + '', SUM(Case when ' + @column + '='' + CAST(@value AS VARCHAR) + '' then 1 else 0 end) AS SIZE'' + CAST(@value AS VARCHAR)

Context

StackExchange Database Administrators Q#44346, answer score: 3

Revisions (0)

No revisions yet.