patternsqlMinor
Database redesign opportunity: What table design to use for this sensor data collection?
Viewed 0 times
thiswhatdesigncollectionsensoropportunitydatabaseredesignforuse
Problem
Background
I have a network of approximately 2000 sensors, each of which has about 100 data points that we collect on 10 minute intervals. These data points are typically int values, but some are strings and floats. This data should be stored for 90 days, more if possible and still efficient.
Database Design
When originally tasked with this project, I wrote a C# app that wrote comma separated files for each sensor. At the time there were not as many, when someone wanted to look at trends we would open the csv in Excel and graph it as needed.
Things grew and we switched to a MySQL database. I created a table for each sensor (yes I know, lots of tables!); it has been working well, but it has some limitations. With so many tables, it is obviously impossible to write a query that will find data among all the sensors when looking for a particular value.
For the next version, I switched to Microsoft SQL Server Express, and put all sensor data into one large table. This also works, and lets us do queries to find values among all sensors that are of interest. However, I ran into the 10GB limit for the Express version, and have decided to switch back to MySQL rather than invest in SQL Server Standard.
The Question
I am happy with MySQL performance and scalability, but am uncertain if sticking to the all-data-in-one-table approach is best. 10GB in a single table seems to be asking for a different design. I should mention that the need to query data for graphing is still there, and I'm concerned that there will be performance issues for a query that graphs, for example, temperature data for one sensor over the full 90 days. (In other words the graph should be something that is quick to produce, without waiting for SQL to sort through piles of data just to isolate the sensor of interest.)
Should I split this table up in some way to increase performance? Or is it not unusual to have such a large table?
I have indexes on the Sensor ID and Timestamp columns, which is pret
I have a network of approximately 2000 sensors, each of which has about 100 data points that we collect on 10 minute intervals. These data points are typically int values, but some are strings and floats. This data should be stored for 90 days, more if possible and still efficient.
Database Design
When originally tasked with this project, I wrote a C# app that wrote comma separated files for each sensor. At the time there were not as many, when someone wanted to look at trends we would open the csv in Excel and graph it as needed.
Things grew and we switched to a MySQL database. I created a table for each sensor (yes I know, lots of tables!); it has been working well, but it has some limitations. With so many tables, it is obviously impossible to write a query that will find data among all the sensors when looking for a particular value.
For the next version, I switched to Microsoft SQL Server Express, and put all sensor data into one large table. This also works, and lets us do queries to find values among all sensors that are of interest. However, I ran into the 10GB limit for the Express version, and have decided to switch back to MySQL rather than invest in SQL Server Standard.
The Question
I am happy with MySQL performance and scalability, but am uncertain if sticking to the all-data-in-one-table approach is best. 10GB in a single table seems to be asking for a different design. I should mention that the need to query data for graphing is still there, and I'm concerned that there will be performance issues for a query that graphs, for example, temperature data for one sensor over the full 90 days. (In other words the graph should be something that is quick to produce, without waiting for SQL to sort through piles of data just to isolate the sensor of interest.)
Should I split this table up in some way to increase performance? Or is it not unusual to have such a large table?
I have indexes on the Sensor ID and Timestamp columns, which is pret
Solution
Interesting... If all sensors produce the same kind of data, it does make sense to put them all in the same table, but with that amount of data, I can see why you'd be worried about performance.
Is 90 days the usual amount of time that you produce a graph for? If so, you could have two tables: the main sensor data table that stores data from 90 (or a little more if you want some slack) days ago up to today, and everything older than that goes in the archive table. That could help reduce the size of the table that reports are begin generated from, and hopefully the majority of your 10 GB of data will be in the archive table, and not in the main table. The archiving job can be scheduled to run nightly.
Maybe also consider building a separate reporting database that stores the data in a structure that is better for generating reports from (tables designed to more closely match what you are querying, and maybe pre-calculate and aggregate values that would otherwise take a long time to generate, if possible), and re-populate it from the main database on a regular (such as nightly) basis. Of course, if you need the reports generated from up-to-the-minute data, this might not work so well.
Is 90 days the usual amount of time that you produce a graph for? If so, you could have two tables: the main sensor data table that stores data from 90 (or a little more if you want some slack) days ago up to today, and everything older than that goes in the archive table. That could help reduce the size of the table that reports are begin generated from, and hopefully the majority of your 10 GB of data will be in the archive table, and not in the main table. The archiving job can be scheduled to run nightly.
Maybe also consider building a separate reporting database that stores the data in a structure that is better for generating reports from (tables designed to more closely match what you are querying, and maybe pre-calculate and aggregate values that would otherwise take a long time to generate, if possible), and re-populate it from the main database on a regular (such as nightly) basis. Of course, if you need the reports generated from up-to-the-minute data, this might not work so well.
Context
StackExchange Database Administrators Q#13882, answer score: 6
Revisions (0)
No revisions yet.