patternsqlMinor
Change Data Capture using HIVE
Viewed 0 times
hivecaptureusingdatachange
Problem
Could you please review my code and let me know if there is something wrong? I'm getting the expected results.
I'm trying to do change data capture using hive. We already have an existing CDC in DataStage, but we are trying to implement in HIVE now. Here is what I have done, please correct me if I'm doing something wrong.
Here is the source data:
Table name: month1
Table name2: month2
Here, the key field is the
I'm trying to do change data capture using hive. We already have an existing CDC in DataStage, but we are trying to implement in HIVE now. Here is what I have done, please correct me if I'm doing something wrong.
Here is the source data:
Table name: month1
e_id, e_name, e_ad
11, john, singapore
22, jack, Melbourne
33, jessy, sydney
44, lampard, canada
55, pomer, perth
Table name2: month2
e_id, e_name, e_add
11, john, odisha
22, jack, Melbourne
44, lampard, canada
55, pomer, perth
66, rahul, mumbai
Here, the key field is the
e_id, and value field is e_name and e_add:select
case when cdc_codes ='Updates' Then month2s
when cdc_codes = 'NoChange' then month1s
when cdc_codes = 'New' then month2s
when cdc_codes = 'Deletes' then month1s
else 'Error' end as fin_cols
from (select
case when m1.e_id = m2.e_id and concat(m1.e_name,m1.e_add) = concat(m2.e_name,m2.e_add) then 'NoChange'
when m1.e_id = m2.e_id and concat(m1.e_name,m1.e_add) <> concat(m2.e_name,m2.e_add) then 'Update'
when m1.e_id is null then 'New'
when m2.e_id is null then 'Deletes'
else 'Error' end as cdc_codes,
concat(m1.e_id,',',m1.e_name,',',m1.e_add) as month1s,
concat(m2.e_id,',',m2.e_name,',',m2.e_add) as month2s
from month1 as m1
full outer join
month2 as m2
on m1.e_id = m2.e_id) as b1
Solution
Nitpicks
There are a few inconsistencies in your capitalization of SQL keywords. It's pretty minor. I see
Naming
Concatenation can make things confusing in SQL, in my opinion. I would encourage to see if there is a more natural way to compare data in your
Why not try:
This one is especially odd looking and would be easy to make a mistake:
CTE
I think using a common table expression instead of a subquery would make your query more elegant and easier to maintain. I also think a little more vertical space in your
There are a few inconsistencies in your capitalization of SQL keywords. It's pretty minor. I see
then and Then in some places, etc. Naming
m1 and m2 don't make for very good aliases. Likewise e_id, e_name, e_add are not very good column names, though I realize those are probably example tables and your real schema is probably quite a bit different. concatConcatenation can make things confusing in SQL, in my opinion. I would encourage to see if there is a more natural way to compare data in your
case statements. For example, instead of this:case when m1.e_id = m2.e_id and concat(m1.e_name,m1.e_add) = concat(m2.e_name,m2.e_add)Why not try:
case when m1.e_id = m2.e_id and m1.e_name = m2.e_name and m1.e_add = m2.e_addThis one is especially odd looking and would be easy to make a mistake:
concat(m1.e_id,',',m1.e_name,',',m1.e_add) as month1s,CTE
I think using a common table expression instead of a subquery would make your query more elegant and easier to maintain. I also think a little more vertical space in your
case statements to separate the conditions from the results helps the code read better:with cte_cdc_codes as (
select
case when m1.e_id = m2.e_id and concat(m1.e_name,m1.e_add) = concat(m2.e_name,m2.e_add)
then 'NoChange'
when m1.e_id = m2.e_id and concat(m1.e_name,m1.e_add) <> concat(m2.e_name,m2.e_add)
then 'Update'
when m1.e_id is null
then 'New'
when m2.e_id is null
then 'Deletes'
else 'Error'
end as cdc_codes,
concat(m1.e_id,',',m1.e_name,',',m1.e_add) as month1s,
concat(m2.e_id,',',m2.e_name,',',m2.e_add) as month2s
from month1 as m1
full outer join
month2 as m2
on m1.e_id = m2.e_id
)
select
case when cdc_codes ='Updates'
Then month2s
when cdc_codes = 'NoChange' then month1s
when cdc_codes = 'New' then month2s
when cdc_codes = 'Deletes' then month1s
else 'Error' end as fin_cols
from cte_cdc_codes;Code Snippets
case when m1.e_id = m2.e_id and concat(m1.e_name,m1.e_add) = concat(m2.e_name,m2.e_add)case when m1.e_id = m2.e_id and m1.e_name = m2.e_name and m1.e_add = m2.e_addconcat(m1.e_id,',',m1.e_name,',',m1.e_add) as month1s,with cte_cdc_codes as (
select
case when m1.e_id = m2.e_id and concat(m1.e_name,m1.e_add) = concat(m2.e_name,m2.e_add)
then 'NoChange'
when m1.e_id = m2.e_id and concat(m1.e_name,m1.e_add) <> concat(m2.e_name,m2.e_add)
then 'Update'
when m1.e_id is null
then 'New'
when m2.e_id is null
then 'Deletes'
else 'Error'
end as cdc_codes,
concat(m1.e_id,',',m1.e_name,',',m1.e_add) as month1s,
concat(m2.e_id,',',m2.e_name,',',m2.e_add) as month2s
from month1 as m1
full outer join
month2 as m2
on m1.e_id = m2.e_id
)
select
case when cdc_codes ='Updates'
Then month2s
when cdc_codes = 'NoChange' then month1s
when cdc_codes = 'New' then month2s
when cdc_codes = 'Deletes' then month1s
else 'Error' end as fin_cols
from cte_cdc_codes;Context
StackExchange Code Review Q#67575, answer score: 2
Revisions (0)
No revisions yet.