HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Join is causing other column data to change?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
columnjoincausingotherdatachange

Problem

I'll lead off with saying that I am relatively new to databases and SQL and I am trying to make an addition to a View in SQL Server that a previous co-worker created.

I am trying to join the Customer_Snacks table so that I can get the Snack Quantity for each customer's order. The issue I'm having is when I do a LEFT OUTER JOIN which, as far as I can tell, should work, some of the other columns data is drastically changed.

I managed to get it to work the way I would like it to by using a subquery but it is DREADFULLY slow. The query takes over a minute to run and in return, the application that is making the call for that view times out.

Here is what I currently have that works (I know, it's messy):

```
SELECT
cl.CustomerID,
cl.FirstName + ' ' + c.LastName as CustomerDisplay,
cl.InvoiceID,
cl.MealPlanID,
mp.DeliveryDate1 as DeliveryDate,
SUM(CASE WHEN cll.Name = 'Meal' THEN cll.Quantity ELSE 0 END) as RegularQuantity,
SUM(CASE WHEN cll.Name = 'Meal' THEN cll.Quantity * Amount ELSE 0 END) as RegularTotal,
SUM(CASE WHEN cll.Name = 'EXTRAPROTEINMEAL' THEN cll.Quantity ELSE 0 END) as ProteinQuantity,
SUM(CASE WHEN cll.Name = 'EXTRAPROTEINMEAL' THEN cll.Quantity * Amount ELSE 0 END) as ProteinTotal,
SUM(CASE WHEN cll.Name = 'TAX' THEN Amount ELSE 0 END) as TotalTax,
SUM(CASE WHEN cll.Name = 'SNACKS' THEN Amount ELSE 0 END) AS SnackTotal,
-- Here is my subquery that I would like to integrate into this view with a join so that it's much faster
(SELECT SUM(cs.Quantity) FROM Fustomer_Snacks as cs WHERE cs.CustomerID = cl.CustomerID and cs.DeliveryDate = mp.DeliveryDate1) as SnackQuantity,
------------------------------------------------------------------
CAST(SUM((CASE WHEN cll.Name = 'home' or cll.Name = 'work' THEN 1 ELSE 0 END)) as bit) as IsDelivery,
SUM(CASE WHEN (cll.Name = 'home' or cll.Name = 'work' or cll.Name = 'pickup') and cll.Quantity > 0 THEN cll.Quantity * Amount else 0 END) as Delive

Solution

The odds are that your problem is that, when you add in the snacks, you have more than one table that returns more than one row for the columns in your GROUP BY statement - probably Customer_Snacks and Customer_Ledger_LineItems.

When the DB engine performs a join, it matches however many rows in the first table with value ABC with however many rows in the second table have the same value, ABC (outer joins working a little differently, but let's ignore that for the moment).

I'll guess that for a given Customer_Ledger row, you tie to one Customer row, one MealPlan row, one (or zero) PromoCode row, and multipleCustomer_Ledger_LineItems rows.

I'm also guessing for one
Customer_Ledger row and its associated MealPlan row, you can bring back multiple Customer_Snacks rows.

So, when we put all the rows together, let's say we have:

  • 1 Customer row



  • 1 Customer_Ledger row



  • 1 MealPlan row



  • 1 PromoCode row



  • 3 Customer_Ledger_LineItems rows



Match all those together, you get
1 1 1 1 3 = 3 rows.

Now, add in 2
Customer_Snacks rows. those are matched against all the rows we already have, so we get 1 1 1 1 3 * 2 = 6 rows.

When we go to take our sums, we'll get double the amount for sums from
Customer_Ledger_LineItems rows, because each row now shows up twice. And, we'll get triple the amount for the sum from Customer_Snacks, because those rows each now show up three times. Basically, you're getting a CROSS JOIN between the relevant Customer_Ledger_LineItems rows, and the Customer_Snacks rows.

There are two ways to avoid this. You already employing the first option - use a subquery to get the sums from one of the tables returning multiple rows to our grouped values.

The other option is very similar: build a temporary table (or a CTE) with all relevant values that you need, so that you'll just have one row with the sums you need when you go to join the data in. In your case, this would look like:

;WITH SnackSums AS
      (SELECT cs.CustomerID
             ,cs.DeliveryDate
             ,SUM(cs.Quantity) as SnackQuantity
         FROM Customer_Snacks cs
        GROUP BY cs.CustomerID, cs.DeliveryDate
      )


and would go right before the
SELECT for your query. Then, you just put ss.SnackQuantity where the subquery was, and add:

LEFT OUTER JOIN
  SnackSums ss
    ON (cl.CustomerId = ss.CustomerId AND ss.DeliveryDate = mp.DeliveryDate1)


right before your
GROUP BY. Finally, you'll also need to add ss.SnackQuantity to the GROUP BY`, since in the context of your main query this is a stand-alone, non-aggregated value.

Now, I make no promises that this will be faster than the subquery. However, it should be accurate - and it doesn't matter is something is blazing fast, if it isn't accurate.

Code Snippets

;WITH SnackSums AS
      (SELECT cs.CustomerID
             ,cs.DeliveryDate
             ,SUM(cs.Quantity) as SnackQuantity
         FROM Customer_Snacks cs
        GROUP BY cs.CustomerID, cs.DeliveryDate
      )
LEFT OUTER JOIN
  SnackSums ss
    ON (cl.CustomerId = ss.CustomerId AND ss.DeliveryDate = mp.DeliveryDate1)

Context

StackExchange Database Administrators Q#177050, answer score: 8

Revisions (0)

No revisions yet.