HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

SQL Data aggregation

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
sqldataaggregation

Problem

I have a problem with sql data aggregation problem.here is the sample sql.I am using oracle database.

select pm.city cityid, to_char(pm.sampledate, 'yyyy-MM') timepoint, count(distinct   me.id) mecount, count(distinct me.hotpointid) hpcount
from upme me, appmday pm
where pm.apid = me.id
          and pm.sampledate between to_date('2011-11-01', 'yyyy-MM-dd') and
          to_date('2012-1-31', 'yyyy-MM-dd')
group by pm.city, to_char(pm.sampledate, 'yyyy-MM')


above i get the count of my data.but i also i want to get the overall result like

select   to_char(pm.sampledate, 'yyyy-MM'), count(distinct me.id), count(distinct me.hotpointid)
from upme me, appmday pm
        where pm.apid = me.id
              and pm.sampledate between to_date('2011-11-01', 'yyyy-MM-dd') and
              to_date('2012-1-31', 'yyyy-MM-dd')
              group by to_char(pm.sampledate, 'yyyy-MM')


basically i want get some result from sql ,in the same sql i want to get the count from the previous result,they both show in the result in one sql. i did not want to union these two sql ,because each sql is very slow.

did someone have a idea? or it cannot be done by one sql without union?

thank you for the reply.

Solution

You can use analytic functions to achieve this:

select distinct pm.city cityid,
    to_char(trunc(pm.sampledate, 'MM'), 'YYYY-MM') as timepoint,
    count(distinct me.id)
        over (partition by pm.city, trunc(pm.sampledate, 'MM')) as mecount,
    count(distinct me.hotpointid)
        over (partition by pm.city, trunc(pm.sampledate, 'MM')) as hpcount,
    count(distinct me.id)
        over (partition by trunc(pm.sampledate, 'MM')) as totmecount,
    count(distinct me.hotpointid)
        over (partition by trunc(sampledate, 'MM')) as tothpcount
from upme me
join appmday pm on pm.apid = me.id
where pm.sampledate between to_date('2011-11-01', 'yyyy-MM-dd')
    and to_date('2012-1-31', 'yyyy-MM-dd');


Or instead of a distinct, use a subquery and a row_number() calculation to remove the duplicates:

select cityid, to_char(timepoint, 'YYYY-MM') as timepoint,
    mecount, hpcount, totmecount, tothpcount
from (
    select pm.city cityid,
        row_number() over (partition by pm.city, trunc(pm.sampledate, 'MM')
            order by pm.city, trunc(pm.sampledate, 'MM')) as rn,
        trunc(pm.sampledate, 'MM') as timepoint,
        count(distinct me.id)
            over (partition by pm.city, trunc(pm.sampledate, 'MM')) as mecount,
        count(distinct me.hotpointid)
            over (partition by pm.city, trunc(pm.sampledate, 'MM')) as hpcount,
        count(distinct me.id)
            over (partition by trunc(pm.sampledate, 'MM')) as totmecount,
        count(distinct me.hotpointid)
            over (partition by trunc(sampledate, 'MM')) as tothpcount
    from upme me
    join appmday pm on pm.apid = me.id
    where pm.sampledate between to_date('2011-11-01', 'yyyy-MM-dd')
        and to_date('2012-1-31', 'yyyy-MM-dd')
)
where rn = 1;


If your existing query is slow you may see a performance difference between the two options, otherwise it's a matter or preference.

(Also you may be aware of this, but if your sampledate has a time component, having your where clause end on '2012-1-31' means you'll only get data up to the start of that day, and not for example anything at 2012-01-31 01:00.)

Code Snippets

select distinct pm.city cityid,
    to_char(trunc(pm.sampledate, 'MM'), 'YYYY-MM') as timepoint,
    count(distinct me.id)
        over (partition by pm.city, trunc(pm.sampledate, 'MM')) as mecount,
    count(distinct me.hotpointid)
        over (partition by pm.city, trunc(pm.sampledate, 'MM')) as hpcount,
    count(distinct me.id)
        over (partition by trunc(pm.sampledate, 'MM')) as totmecount,
    count(distinct me.hotpointid)
        over (partition by trunc(sampledate, 'MM')) as tothpcount
from upme me
join appmday pm on pm.apid = me.id
where pm.sampledate between to_date('2011-11-01', 'yyyy-MM-dd')
    and to_date('2012-1-31', 'yyyy-MM-dd');
select cityid, to_char(timepoint, 'YYYY-MM') as timepoint,
    mecount, hpcount, totmecount, tothpcount
from (
    select pm.city cityid,
        row_number() over (partition by pm.city, trunc(pm.sampledate, 'MM')
            order by pm.city, trunc(pm.sampledate, 'MM')) as rn,
        trunc(pm.sampledate, 'MM') as timepoint,
        count(distinct me.id)
            over (partition by pm.city, trunc(pm.sampledate, 'MM')) as mecount,
        count(distinct me.hotpointid)
            over (partition by pm.city, trunc(pm.sampledate, 'MM')) as hpcount,
        count(distinct me.id)
            over (partition by trunc(pm.sampledate, 'MM')) as totmecount,
        count(distinct me.hotpointid)
            over (partition by trunc(sampledate, 'MM')) as tothpcount
    from upme me
    join appmday pm on pm.apid = me.id
    where pm.sampledate between to_date('2011-11-01', 'yyyy-MM-dd')
        and to_date('2012-1-31', 'yyyy-MM-dd')
)
where rn = 1;

Context

StackExchange Database Administrators Q#13454, answer score: 4

Revisions (0)

No revisions yet.