HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Help with indexing narrow table with many millions of rows

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsmillionswithindexinghelpmanynarrowtable

Problem

I have a good amount of experience with SQL Server, but not much with indexes and I fear I'm a little bit out of my depth in trying to figure out how to correctly structure a table that has 100+ million rows. I know this gets asked a lot, but I haven't found an answer that matches exactly what I'm trying to do.

The table

CREATE TABLE [dbo].[Trades]
(
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [TimeStamp] [datetime] NOT NULL,
    [Type] [int] NOT NULL,
    [Amount] [decimal](18, 2) NOT NULL,
    [Price] [decimal](18, 2) NOT NULL,
    [Exchange_Id] [int] NOT NULL
)


As you can see, this is a table that stores commodity trades. Because of that, there are currently 100+ million rows in the table, with about 10,000 rows being added daily (at the end of the day).

I'm at a bit of a loss as to how to index this table to maximize performance. I don't so much care about insert performance since the table is updated once a day and it doesn't need to be a fast operation. Rows are also never updated once inserted. The big problem is query speed. This table is queried relatively often.

By far the most common queries are on the [TimeStamp] column alone (eg, TimeStamp between two dates), or on the [TimeStamp] column together with the [Exchange_Id] column (eg, trades on a certain Exchange between two [TimeStamp]s).

Currently, there is a unique clustered index on the [Id] column, but I'm not sure if that's the right thing to do based on the circumstances. I've considered making the clustered index on the [TimeStamp] column, but it would have to be non-unique which I've read is generally bad. As for non-clustered indexes, I'm not quite sure how to arrange them or on what columns.

Any help is greatly appreciated.

Solution

Based on your question, I would index the Timestamp column with the clustered index. And to make the index unique, just make sure to include the identity column in the index definition:

... PRIMARY KEY CLUSTERED ([Timestamp], [Id])


If query performance for queries on Exchange_Id is still an issue after that, you can also add a non-clustered index that looks something like this:

UNIQUE INDEX ([Exchange_Id], [Timestamp], [Id])


.. but if you do, consider including any columns that the query may need, in order to create what's known as a covering index.

UNIQUE INDEX ([Exchange_Id], [Timestamp], [Id]) INCLUDE ([Type], [Amount], [Price])


Remember that there's a disk space issue involved as well, as you stated that your table contains a large number of rows. The clustered index will not change the amount of disk space your table consumes either way, but adding a non-clustered index will allocate extra space. If you INCLUDE all the columns from your table, like I did in the example, the non-clustered index will roughly take up as much space as the rest of the table does.

Code Snippets

... PRIMARY KEY CLUSTERED ([Timestamp], [Id])
UNIQUE INDEX ([Exchange_Id], [Timestamp], [Id])
UNIQUE INDEX ([Exchange_Id], [Timestamp], [Id]) INCLUDE ([Type], [Amount], [Price])

Context

StackExchange Database Administrators Q#83738, answer score: 4

Revisions (0)

No revisions yet.