HiveBrain v1.2.0
Get Started
← Back to all entries
debugsqlMinor

Best solution to fixing database design with GUID as primary key

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
guidprimarywithdesigndatabasefixingsolutionkeybest

Problem

I am after some confirmation of this idea to fix a badly performing database or a better suggestion if any one has one. Always open to better suggestions.

I have a very large database (20+ million records growing by about 1/2 million per day) which are using GUID as PK.

An oversight on my part but the PK is clustered on SQL server and is causing performance issues.

The reason for a guid - this database is partially synchronised with 150 other databases so the PK needed to be unique. The synchronisation is not managed by SQL Server, rather there is a custom process built which keeps the data in sync for the requirements of the system - all based on that GUID.

Each of the 150 remote databases don't store the full data as stored in the central SQL Database. they only store a subset of the data they actually require, and the data the require is not unique to them (10 out of the 150 database may have some of the same records from other sites databases for example - they share). Also - the data is actually generated at the remote sites - not at the central point - hence the need for the GUIDs.

The central database is used not only for keeping everything in sync, but queries from 3000+ users will be executed against that very large fragmented database. Already this is a big problem in initial testing.

Fortunately we are not live yet - so I can make changes and take things offline if required which is at least something.

The performance of the remote databases is not a problem - the data subsets are pretty small and the database usually never gets above 1GB in size in total. The records are fed back to the main system quite regularly and removed from the smaller BD's when no longer required.

The performance of the central DB which is the keeper of all records is woeful - due to a clustered GUID as a primary key for that many records. The index fragmentation is off the charts.

So - my thoughts to fix the performance issue is to Create a new column - Unsigned BIGINT

Solution

You certainly need NOT to cluster on the GUID. If you have something which would allow you to uniquely identify records other than that GUID, I'd suggest you look at building an unique index on that other field & making that index clustered. If not, you are free to cluster on other fields, even using nonunique indexes. The approach there would be to cluster however best facilitates splitting your data up and querying - so, if you've got a "region" field, or something, that might be a candidate for your clustering scheme.

The problem with changing to a BIGINT would be additions to the data from other databases & integrating their database into the central store. If this is not a consideration - and will never be a consideration - then, yes, the BIGINT would solve the index rebalancing issue nicely.

Behind the scenes, if you don't specify a clustered index, SQL Server does much the same thing: it creates a row ID field & maps all other indexes into that. So, by doing it yourself, you're solving it just as SQL would solve it.

Context

StackExchange Database Administrators Q#37277, answer score: 8

Revisions (0)

No revisions yet.