HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Splitting a large table to improve performance

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
splittingimprovelargeperformancetable

Problem

This is a follow-up to an earlier question. I have a SQL Server 2008 R2 Standard server, that holds a single database, which itself has almost nothing except a large table.

The table is 100+ million rows (35 columns) and growing at around 250,000 rows per day. We need all the data to be "online", and most of the columns need to be searchable in some fashion. The vast majority of activity on the table is reading; apart from the new data being INSERTed during the day, there's no need to change anything.

Users perform a range of queries on the table, ranging from simple look-up-a-record requests to pulling tens of thousands of rows based on a range of criteria. We only have limited control on the queries that are run, and performance is starting to suffer, even with indexing.

A big part of the problem is disk I/O, which we're addressing by retrofitting a SSD-based array. As all database files will be on this new array, the consensus is that having multiple database files won't make any difference, but that splitting the table up into separate tables might be the way to go.

I'm now puzzling over what would be the best approach to this. Two ideas that I'm debating with myself:

-
Split the table into "tiers"

  • A table containing the last week's data, which is the one being



INSERTed into each day

  • Next table containing from last week


back to 3 months previous

  • Next table containing from 3 months to 6 months



  • Next table containing anything older than 6 months



I'd then "shuffle" the data down the tiers overnight (the database
is only accessed 8am-10pm, so I have a window overnight to process
data).

-
Create tables for data ranges

  • Create a table for a data range - say per quarter. I'd then have


the data INSERTing into the table 2Q2013, and then trip over to
3Q2013, 4Q2014 etc ...

I could use filegroups to make older tables "read only" if this
would improve performance.

Option 1 is the easiest for me to implement, but I'm not sure if this is

Solution

My first question to your situation would be if the Standard Edition is a given or if you had the possibility to upgrade to Enterprise.

Within the Enterprise Edition of SQL Server 2008 R2 you would have the possibility to use Partitioned Tables.

Essentially, they do what you described as a manual process in the background.

Quote from 1:


Partitioning makes large tables or indexes more manageable, because partitioning enables you to manage and access subsets of data quickly and efficiently, while maintaining the integrity of a data collection. By using partitioning, an operation such as loading data from an OLTP to an OLAP system takes only seconds, instead of the minutes and hours the operation takes in earlier versions of SQL Server. Maintenance operations that are performed on subsets of data are also performed more efficiently because these operations target only the data that is required, instead of the whole table.

EDIT as op stated that Enterprise is not an option:

When using Standard Edition, you can fall back to using Partitioned Views.

The main difference from the usage side is that you have to maintain the partitioned tables for yourself and you also have to make sure that every new member tables gets added into the view.

The idea builds on your scenario 2 but you don't have to care about where to insert the data, as this is determined by the Partitioned View based on CHECK-Constraints in the member tables.

For a good example, see 2

As for all partitioning scenarios you have to make sure that the partitioning columns you choose fit your query-and-insert behaviour.

Context

StackExchange Database Administrators Q#48176, answer score: 5

Revisions (0)

No revisions yet.