HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Concatenating rows into a single string query running for 5 hrs and counting

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowscountingintoconcatenatingquerysinglerunningforhrsand

Problem

I have a table with 2.6m records. It looks like this:

email                           prject_name
rafael.nadal@xyz.com              lab1
rafael.nadal@xyz.com              lab2
rafael.nadal@xyz.com              lab3
TEST@TEST.COM                     shift1
TEST@TEST.COM                     shift2


But I want my table to look like this:

email                     project_name
rafael.nadal@xyz.com     lab1, lab2, lab3
TEST@TEST.COM            shift1, shift2, shift3


I have used this query

select distinct email ,
STUFF((Select ','+project_name
from dbo.[UMG sent 2016] as  T1
where T1.email=T2.email
FOR XML PATH('')),1,1,'') from dbo.[UMG sent 2016] as T2;


It has been running for 5 hours already.

How do I speed up the process?

Solution

As you don't care about the order of the concatenated items it would be quite easy to knock up a custom CLR aggregate to do this and it will likely out perform the XML method, there is an example of one in this article.

There is a quick and easy change you can make to your existing code though.

Instead of

SELECT DISTINCT email,
                STUFF((SELECT ',' + project_name
                       FROM   dbo.[UMG sent 2016] AS T1
                       WHERE  T1.email = T2.email
                       FOR XML PATH('')), 1, 1, '')
FROM   dbo.[UMG sent 2016] AS T2;


You could use

SELECT email,
       STUFF((SELECT ',' + project_name
              FROM   dbo.[UMG sent 2016] AS T1
              WHERE  T1.email = T2.email
              FOR XML PATH('')), 1, 1, '')
FROM   dbo.[UMG sent 2016] AS T2
GROUP  BY email;


The difference being that the first one calculates concatenated strings for all rows in [UMG sent 2016] and then removes duplicates for email,string. The second one finds distinct email first and then just performs the string concatenation work on those distinct values. So in your example data instead of performing the work 5 times (twice for test and 3 times for Nadal) then throwing away three of them it will just perform the work 2 times, once for each.

Code Snippets

SELECT DISTINCT email,
                STUFF((SELECT ',' + project_name
                       FROM   dbo.[UMG sent 2016] AS T1
                       WHERE  T1.email = T2.email
                       FOR XML PATH('')), 1, 1, '')
FROM   dbo.[UMG sent 2016] AS T2;
SELECT email,
       STUFF((SELECT ',' + project_name
              FROM   dbo.[UMG sent 2016] AS T1
              WHERE  T1.email = T2.email
              FOR XML PATH('')), 1, 1, '')
FROM   dbo.[UMG sent 2016] AS T2
GROUP  BY email;

Context

StackExchange Database Administrators Q#140623, answer score: 9

Revisions (0)

No revisions yet.