HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

Data Warehouse: How can i query daily snapshots?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
canwarehousequerysnapshotshowdatadaily

Problem

I have some snapshots of a database that are not timeseries. For example:

-
Snapshot day 1:

+----+---------------+------------+------------+        
| ID |     Title     |  Category  |    Date    |
+----+---------------+------------+------------+
| 1  | My First Post | helloworld | 2015-01-01 |
+----+---------------+------------+------------+


-
Snapshot day 2 (A new post is added today):

+----+----------------+------------+------------+        
| ID |      Title     |  Category  |    Date    |
+----+----------------+------------+------------+
| 1  | My first post  | helloworld | 2015-01-01 |
| 2  | My second post | other      | 2015-01-02 |
+----+----------------+------------+------------+


-
Snapshot day 3 (Post 2 is removed today):

+----+---------------+------------+------------+        
| ID |     Title     |  Category  |    Date    |
+----+---------------+------------+------------+
| 1  | My First Post | helloworld | 2015-01-01 |
+----+---------------+------------+------------+


So between days, a row of the table may or not may be constant. Now, I need to be able to use a query like this:

SELECT category, COUNT(*) from day1.My_table group by category


This is for one table of one day. If we want to count the daily average of posts by category in a month we should do something like:

SELECT category, SUM(cnt) / 30 
from ( 
    SELECT category, COUNT(*) as cnt 
    from day1.My_table 
    group by category 
        UNION ALL SELECT category, COUNT(*) as cnt 
                  from day2.My_table 
                  group by category 
        UNION ALL ... 
        UNION ALL SELECT category, COUNT(*) as cnt 
                  from day30.My_table 
                  group by category
) group by category


Another example, the number of post published in a month:

``
SELECT COUNT(distinct id)
from (
SELECT id
from day1.My_table
UNION ALL ...
UNION ALL SELECT id
from day30.My_table
)
`

Solution

Let's think out of the box. Instead of having a "snapshot", let's have a "log". What you currently have is "current" state of things; adding a "log" would provide the "history", from which could be derived the 'lost' info.

One way to implement the log is to have a TRIGGER on INSERT or UPDATE of the table, and have the trigger write to the log file. This log will not be pleasant for the ad hoc queries, so have a nightly job (or maybe hourly) that summarizes the changes for the day -- net gain (or loss) of number of posts, etc. The "day2" info and the "last month" info can then be derived from this summary table quite quickly. Or perhaps a second level of summarization that declares what the state was for each day. I doubt if UNION would be needed. The "snapshot" would not be involved.

Context

StackExchange Database Administrators Q#127609, answer score: 2

Revisions (0)

No revisions yet.