HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to store many smallish time-series in a relational DB?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
smallishhowtimestorerelationalmanyseries

Problem

Input Data

Multiple Tests Beds generate measurement data of various complexity.

In it's most basic form, not considering any meta-data, one measurement on a Test Bed will be a small (1 - a few thousand samples) time-series with a couple of dozen channels/signals/attributes per sample.

Measurements across time and Test Beds will have a similar set of signals, but not always the same as sensors are added and removed for the test setups.

Data volume

Currently we estimate our data rate at 6 testbeds x 4 test per hour x 12 hours a day x 4000 samples per test == 1,152,000 samples per day x 365 == 420,480,000 samples per year

_ x 48 columns per sample (currently 32 bit floats, mostly) ~~~ 75 GB per year

(columns in this case refers to channel/signal)

If/When more testbeds are added the data volume might increase accordingly.

Data Input

The test beds generate the data locally and the data is then imported asynchronously into the db. (A few thousand samples might be generated in a the time of one second, then reviewed locally and then either scratched or imported.)

Queries

We expect queries to be mostly on aggregates of the single measurements. I.e., you like to find all measurements (each having 4k samples) where e.g. the mean of channel_output_voltage is within a certain range.

Database layout?

What is a good way to set up tables for this? What factors have to be taken into account?

Theoretically I could go with one table per measurement generating 100,000 tables per year, but it doesn't strike me as a good idea.

Or I could stick everything into one big table (with hundreds of columns) that has room for all channels and channels get added as needed: One row per sample. Unused channels remain NULL.

MEASUREMENTS
------------
measurement_id } PK
time_stamp     }
channel_1 (may be NULL for a certain measurement_id ...)
channel_2
...
channel_n(+1)


Or I could go with an approach of having one table for the samples (timestamps) and one table contai

Solution

Are your queries supposed to collect data for each month/year?

You can use partitions to store your information in different physical files. Partitions can increase the speed of SELECT statements when you only need information about a specific period. http://msdn.microsoft.com/en-us/library/ms345146%28v=sql.90%29.aspx

When creating a partitioned table in Microsoft SQL Server you can also create different file groups on different physical locations and back those up separately.

With regards to your question about the database design, you may want to read about normalization here: http://en.wikipedia.org/wiki/Database_normalization

Context

StackExchange Database Administrators Q#17700, answer score: 3

Revisions (0)

No revisions yet.