patternMinor
correlate subquery in hive
Viewed 0 times
correlatesubqueryhive
Problem
SELECT ndo.sku ParentSKU, visitsWhenSKUWasOnline.s
FROM
Temp.NumberOfDaysOnline ndo
JOIN
(
SELECT SUM(gasessiondata.sessions) as s
FROM gasessiondata
WHERE
gasessiondata.date >= ndo.FromDate
AND
gasessiondata.date <= ndo.ToDate
) as visitsWhenSKUWasOnline
ON 1=1I can't use
ndo.fromDate and ndo.ToDate inside of the subquery since hive does not support correlated sub queries. What could be an alternative for something like this?Edit:
down vote
favorite
I have a table like this:
date val
2016-1-1 8
2016-2-1 10
2016-1-2 30
2016-1-3 30
Now, I have two dates coming in from a different table, let's call them `fromdate` and todateIf fromdate was 1,
2016-1-1 and Todate was 2016-1-2, I need 8+30 = 38 as the final valueI need to check
SUM(val) of all those values dates corresponding to which are between the fromdate and twodate.Solution
The syntax of you query is not often called "correlated subquery". A correlated subquery would be like this:
Your query is similar to a
It's easy to rewrite though so the join is done first and then the
Seems that Hive supports only equi-joins and cross joins. Moving the condition to the
SELECT
ndo.sku AS ParentSKU,
(
SELECT SUM(g.sessions)
FROM gasessiondata AS g
WHERE g.date >= ndo.FromDate
AND g.date <= ndo.ToDate
) AS s
FROM
Temp.NumberOfDaysOnline AS ndo ;Your query is similar to a
LATERAL join (CROSS / OUTER APPLY in SQL Server). This type of correlated joins has not been implemented in many DBMS.It's easy to rewrite though so the join is done first and then the
GROUP BY. This simple query should be supported by most, if not all DBMS:SELECT
ndo.sku AS ParentSKU,
SUM(g.sessions) AS s
FROM
Temp.NumberOfDaysOnline AS ndo
JOIN
gasessiondata AS g
ON g.date >= ndo.FromDate
AND g.date <= ndo.ToDate
GROUP BY
ndo.PK, -- the PRIMARY KEY of the table
ndo.sku ; -- and any column used in the SELECT listSeems that Hive supports only equi-joins and cross joins. Moving the condition to the
WHERE clause might solve this:SELECT
ndo.sku AS ParentSKU,
SUM(g.sessions) AS s
FROM
Temp.NumberOfDaysOnline AS ndo
CROSS JOIN
gasessiondata AS g
WHERE
g.date >= ndo.FromDate
AND g.date <= ndo.ToDate
GROUP BY
ndo.PK, -- the PRIMARY KEY of the table
ndo.sku ; -- and any column used in the SELECT listCode Snippets
SELECT
ndo.sku AS ParentSKU,
(
SELECT SUM(g.sessions)
FROM gasessiondata AS g
WHERE g.date >= ndo.FromDate
AND g.date <= ndo.ToDate
) AS s
FROM
Temp.NumberOfDaysOnline AS ndo ;SELECT
ndo.sku AS ParentSKU,
SUM(g.sessions) AS s
FROM
Temp.NumberOfDaysOnline AS ndo
JOIN
gasessiondata AS g
ON g.date >= ndo.FromDate
AND g.date <= ndo.ToDate
GROUP BY
ndo.PK, -- the PRIMARY KEY of the table
ndo.sku ; -- and any column used in the SELECT listSELECT
ndo.sku AS ParentSKU,
SUM(g.sessions) AS s
FROM
Temp.NumberOfDaysOnline AS ndo
CROSS JOIN
gasessiondata AS g
WHERE
g.date >= ndo.FromDate
AND g.date <= ndo.ToDate
GROUP BY
ndo.PK, -- the PRIMARY KEY of the table
ndo.sku ; -- and any column used in the SELECT listContext
StackExchange Database Administrators Q#145224, answer score: 2
Revisions (0)
No revisions yet.