snippetsqlMinor
Data Warehouse: How can i query daily snapshots?
Viewed 0 times
canwarehousequerysnapshotshowdatadaily
Problem
I have some snapshots of a database that are not timeseries. For example:
-
Snapshot day 1:
-
Snapshot day 2 (A new post is added today):
-
Snapshot day 3 (Post 2 is removed today):
So between days, a row of the table may or not may be constant. Now, I need to be able to use a query like this:
This is for one table of one day. If we want to count the daily average of posts by category in a month we should do something like:
Another example, the number of post published in a month:
``
-
Snapshot day 1:
+----+---------------+------------+------------+
| ID | Title | Category | Date |
+----+---------------+------------+------------+
| 1 | My First Post | helloworld | 2015-01-01 |
+----+---------------+------------+------------+-
Snapshot day 2 (A new post is added today):
+----+----------------+------------+------------+
| ID | Title | Category | Date |
+----+----------------+------------+------------+
| 1 | My first post | helloworld | 2015-01-01 |
| 2 | My second post | other | 2015-01-02 |
+----+----------------+------------+------------+-
Snapshot day 3 (Post 2 is removed today):
+----+---------------+------------+------------+
| ID | Title | Category | Date |
+----+---------------+------------+------------+
| 1 | My First Post | helloworld | 2015-01-01 |
+----+---------------+------------+------------+So between days, a row of the table may or not may be constant. Now, I need to be able to use a query like this:
SELECT category, COUNT(*) from day1.My_table group by categoryThis is for one table of one day. If we want to count the daily average of posts by category in a month we should do something like:
SELECT category, SUM(cnt) / 30
from (
SELECT category, COUNT(*) as cnt
from day1.My_table
group by category
UNION ALL SELECT category, COUNT(*) as cnt
from day2.My_table
group by category
UNION ALL ...
UNION ALL SELECT category, COUNT(*) as cnt
from day30.My_table
group by category
) group by categoryAnother example, the number of post published in a month:
``
SELECT COUNT(distinct id)
from (
SELECT id
from day1.My_table
UNION ALL ...
UNION ALL SELECT id
from day30.My_table
)
`Solution
Let's think out of the box. Instead of having a "snapshot", let's have a "log". What you currently have is "current" state of things; adding a "log" would provide the "history", from which could be derived the 'lost' info.
One way to implement the log is to have a
One way to implement the log is to have a
TRIGGER on INSERT or UPDATE of the table, and have the trigger write to the log file. This log will not be pleasant for the ad hoc queries, so have a nightly job (or maybe hourly) that summarizes the changes for the day -- net gain (or loss) of number of posts, etc. The "day2" info and the "last month" info can then be derived from this summary table quite quickly. Or perhaps a second level of summarization that declares what the state was for each day. I doubt if UNION would be needed. The "snapshot" would not be involved.Context
StackExchange Database Administrators Q#127609, answer score: 2
Revisions (0)
No revisions yet.