snippetsqlMinor
How to manage 3.1 billion rows of data?
Viewed 0 times
rowsbillionmanagehowdata
Problem
I am currently tasked with implementing a storage schema for a relatively large amount of data. The data will primarily be accessed to determine a current
A recent requirement was added to track the
NOTE: Ideally, I would like to consider a MongoDB option, but I need to demonstrate that I have exhausted the SQL-Server options first.
The Data
The following table represents the primary data source (queried most frequently). The table will have approximately five million rows. The data changes will predominantly be
The second table is notably larger at approximately 3.1 billion rows (representing the past six months of data). Data older than six months will be purged; otherwise strictly data
```
// Simplified Table
CREATE TABLE [dbo].DataPointValueHistory NOT NULL,
[delta] decimal NOT NULL
CO
data point value, but I am also required to track the past six months of history for data trending/analytics. A recent requirement was added to track the
min / max / sum value for the past hour. NOTE: Ideally, I would like to consider a MongoDB option, but I need to demonstrate that I have exhausted the SQL-Server options first.
The Data
The following table represents the primary data source (queried most frequently). The table will have approximately five million rows. The data changes will predominantly be
UPDATE statements with very occasional INSERT statements after the initial data load. I have opted to cluster the data by dataPointId as you will always be selecting all values for a given data point. // Simplified Table
CREATE TABLE [dbo].[DataPointValue](
[dataPointId] [int] NOT NULL,
[valueId] [int] NOT NULL,
[timestamp] [datetime] NOT NULL,
[minimum] [decimal](18, 0) NOT NULL,
[hourMinimum] [decimal](18, 0) NOT NULL,
[current] [decimal](18, 0) NOT NULL,
[currentTrend] [decimal](18, 0) NOT NULL,
[hourMaximum] [decimal](18, 0) NOT NULL,
[maximum] [decimal](18, 0) NOT NULL
CONSTRAINT [PK_MeterDataPointValue] PRIMARY KEY CLUSTERED ([dataPointId],[valueId])
)The second table is notably larger at approximately 3.1 billion rows (representing the past six months of data). Data older than six months will be purged; otherwise strictly data
INSERT statements (~200 rows / sec, 720,000 rows / hour, 17 million rows / week).```
// Simplified Table
CREATE TABLE [dbo].DataPointValueHistory NOT NULL,
[delta] decimal NOT NULL
CO
Solution
I found this analysis very useful when I was researching on building an analytics solution which would have billions of rows in one table.
http://leiliweb.wordpress.com/2012/12/11/partitioned-table-and-index-strategies-using-sql-server-2008/
http://leiliweb.wordpress.com/2012/12/11/partitioned-table-and-index-strategies-using-sql-server-2008/
Context
StackExchange Database Administrators Q#31314, answer score: 4
Revisions (0)
No revisions yet.