HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to manage 3.1 billion rows of data?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsbillionmanagehowdata

Problem

I am currently tasked with implementing a storage schema for a relatively large amount of data. The data will primarily be accessed to determine a current data point value, but I am also required to track the past six months of history for data trending/analytics.

A recent requirement was added to track the min / max / sum value for the past hour.

NOTE: Ideally, I would like to consider a MongoDB option, but I need to demonstrate that I have exhausted the SQL-Server options first.

The Data

The following table represents the primary data source (queried most frequently). The table will have approximately five million rows. The data changes will predominantly be UPDATE statements with very occasional INSERT statements after the initial data load. I have opted to cluster the data by dataPointId as you will always be selecting all values for a given data point.

// Simplified Table
CREATE TABLE [dbo].[DataPointValue](
    [dataPointId]  [int] NOT NULL,
    [valueId]      [int] NOT NULL,
    [timestamp]    [datetime] NOT NULL,
    [minimum]      [decimal](18, 0) NOT NULL,
    [hourMinimum]  [decimal](18, 0) NOT NULL,
    [current]      [decimal](18, 0) NOT NULL,
    [currentTrend] [decimal](18, 0) NOT NULL,
    [hourMaximum]  [decimal](18, 0) NOT NULL,
    [maximum]      [decimal](18, 0) NOT NULL

    CONSTRAINT [PK_MeterDataPointValue] PRIMARY KEY CLUSTERED ([dataPointId],[valueId])
)


The second table is notably larger at approximately 3.1 billion rows (representing the past six months of data). Data older than six months will be purged; otherwise strictly data INSERT statements (~200 rows / sec, 720,000 rows / hour, 17 million rows / week).

```
// Simplified Table
CREATE TABLE [dbo].DataPointValueHistory NOT NULL,
[delta] decimal NOT NULL

CO

Solution

I found this analysis very useful when I was researching on building an analytics solution which would have billions of rows in one table.

http://leiliweb.wordpress.com/2012/12/11/partitioned-table-and-index-strategies-using-sql-server-2008/

Context

StackExchange Database Administrators Q#31314, answer score: 4

Revisions (0)

No revisions yet.