HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Redesign the storage of large amounts of sensor data

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thestoragesensorredesignlargeamountsdata

Problem

I have been tasked to implement/redesign a solution that will store weather data from a sensor array. The array will consist of ~40 towers, each with about ~10 sensors each that'll sample atmospheric conditions at 10 second intervals for an undetermined amount of time (years).
Some of the applications and requirements for this task are as follows:

  • Manage and retrieve tower/sensor configurations to make sense of data analysis.



  • Data visualization by sensor or time intervals for meteorological observations.



  • Provide customers with reliable and persistent data resources/data sets to compare model and sensor performance (may require some post-processing to deliver in the format required?).



Note: The current solution (implemented as a proof of concept, with 5 towers) stores data as flat files (one file per hour).

I was originally unsure if this would constitute a big data problem in the future, so I researched on a couple of solutions both for relational and NoSQL databases, but I feel I need a little more guidance, as I am no expert in data management.

One of the solutions I thought was to store data in a relational database indexed by tower, sensor, and timestamps and partitioning the table by date.

Another, based on future scaling, was to store it in a document-type NoSQL database, like MongoDB, and mimic the structure of the current solution.

Are any of these good approaches? If not, what would be a better/recommended solution? Also, would it even be necessary to redesign the current solution? I was told that the rationale for using flat files is that they believed a relational database would take too much overhead. Is there a way to avoid this if it was the case?

Solution

Since (a) the information you are working with appears to be, in a of itself, a very valuable organizational resource, and (b) the volume of data will be considerable, I would decidedly (c) build a relational database on one of the major SQL platforms.

That of course —from a very general perspective— requires three essential factors:

-
A clearly defined conceptual schema, in which one has to identify and mark out with precision the prototypes of things, i.e., the entity types (including their properties and interrelationships) of relevance in the business environment you are working with (e.g., the Towers and Sensors you mention).

As you know, this point entails establishing continuous and productive communication with business experts.

-
A logical layout that reflects the conceptual level with accuracy, by means of tables (i.e., mathematical relations) holding well-delimited columns with appropriate column names and types (i.e., relation attributes) and all the corresponding constraints to ensure that the data complies with all the rules determined at the previous tier.

Therefore, it is here where the vast power of the relational model comes into play (though its advantages have positive repercussions at other levels of abstraction).

-
A physical arrangement that, e.g., boosts the execution speed of the —dynamic— logical data manipulation operations and guarantees the logical constraints.

Since the relational model offers physical data independence, a database management system (DBMS for brevity) can provide any kind of structure at this level, not exclusively indexes, to support the logical definitions. In the case of the leading SQL platforms, yes, this commonly implies, precisely, setting up an indexing strategy based on the database-specific query tendencies, and you brought up very interesting considerations with respect to some possible configurations but, without knowing the particular informational necessities with exactitude, offering specific advices in this regard would not be suitable.

Other elements that deserve evaluation are, e.g., upgrading network infrastructure to increase bandwidth, enabling proper server configurations (hardware- and software-wise), etc. And, if, and only if, a practitioner is qualified enough, he or she could even modify the source code of the DBMS of choice (more feasible in open source environments, naturally).

In this way, the following aspects that you highlight



  • Manage and retrieve tower/sensor configurations to make sense of data analysis.



  • Data visualization by sensor or time intervals for meteorological observations.



  • Provide customers with reliable and persistent data resources/data sets to compare model and sensor performance (may require some post-processing to deliver in the format required?).




would be well-addressed, because you would easily be able to declare queries to, e.g., obtain information in very meaningful forms. For instance, you can get data associated with

  • the Sensor identified by SensorNumber 1750, installed at the Tower identified by TowerNumber 31, between the Date 1 June 2017 and the Date 27 June 2017.



Furthermore, since (1) the data in a relational database is logically managed in terms of sets with the aid of operations based on the relational algebra, and (2) the different SQL engines are physically optimized (some more than the others) for set processing, you can, e.g.,

  • compare set a with set b;



  • join set c with set d;



  • obtain subset f through a restriction on set e;



  • produce n subsets from n set intersections;



  • project n attributes from set f



  • retrieve information from set z that is the result of a union of set x with set y;



  • and so on.



The data manipulation possibilities are in fact huge —demonstrating the unmatched versatility of the relational paradigm— since you can work not only with base tables (the ones declared with CREATE TABLE … ( … ); statements) but also with derived ones (the ones expressed via SELECT …; operations, sometimes fixed as VIEWs). In other words, you can (i) express new data structures based on (ii) prior ones operating on (iii) the single underlying relational construct, i.e., the mathematical relation.

Evidently, the arrangement of the base tables and columns of a relational database can evolve, and (a) new base tables or columns may be incorporated into it when (b) keeping track of new entity types or entity type properties is deemed worthwhile in the pertinent business context. In other words, neither the initial structure nor the opening constraints of a relational database are expected to be static or immutable. Besides, a database that is appropriately organized from the very beginning tends to be much easier to modify when new informational requirements arise.

In agreement with the considerations above, the logical format of the applicable sets must be produced declaratively, at the database logical level. The graphical or present

Context

StackExchange Database Administrators Q#177332, answer score: 11

Revisions (0)

No revisions yet.