HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Composite Primary Key in multi-tenant SQL Server database

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
tenantmultiprimarysqldatabasecompositeserverkey

Problem

I'm building a multi-tenant app (single database, single schema) using ASP Web API, Entity Framework, and SQL Server/Azure database. This app will be used by 1000-5000 customers. All the tables will have TenantId (Guid / UNIQUEIDENTIFIER) field. Right now, I use single field Primary Key which is Id (Guid). But by using just the Id field, I have to check if the data supplied by the user is from / for the right tenant. For example, I have a SalesOrder table which has a CustomerId field. Every time users post/update a sales order, I have to check if the CustomerId is from the same tenant. It gets worse because each tenant might have several outlets. Then I have to check TenantId and OutletId. It's really a maintenance nightmare and bad for performance.

I'm thinking to add TenantId to the Primary Key along with Id. And possibly add OutletId, too. So the Primary Key in the SalesOrder table will be: Id, TenantId, and OutletId. What is the downside of this approach? Would the performance hurt badly using a composite key? Does the composite key order matter? Are there better solutions for my problem?

Solution

Having worked on a large-scale, multi-tenant system (federated approach with customers spread across 18+ servers, each server having identical schema, just different customers, and thousands of transactions per second per each server), I can say:

-
There are some folks (a few, at least) who will agree on your choice of GUID as the IDs for both "TenantID" and whatever entity "ID". But no, not a good choice. All other considerations aside, that choice alone will hurt in a few ways: fragmentation to start with, vast amounts of wasted space (don't say disk is cheap when thinking about enterprise storage — SAN — or queries taking longer due to each data page holding fewer rows than it could with either INT or BIGINT even), more difficult support and maintenance, etc. GUIDs are great for portability. Is the data generated in some system and then transferred to another? If not, then switch to a more compact data type (e.g. TINYINT, SMALLINT, INT, or even BIGINT), and increment sequentially via IDENTITY or SEQUENCE.

-
With item 1 out of the way, you really do need to have the TenantID field in EVERY table that has user data. That way you can filter anything without needing an extra JOIN. This also means that ALL queries against client-data tables are required to have the TenantID in the JOIN condition and/or WHERE clause. This also helps guarantee that you don't accidentally mix data from different customers, or show Tenant A data from Tenant B.

-

I'm thinking to add TenantId as primary key along with Id. And possibly add OutletId too. So primary key(s) in sales order table will be Id, TenantId, OutletId.

Yes, you should have your clustered indexes on the client-data tables be composite keys, inclusive of TenantID and ID **. This also ensures that TenantID is in every NonClustered index (since those include the Clustered Index Key(s)) which you would need anyway since 98.45% of queries against client-data tables will need the TenantID (the main exception is when garbage collecting old data based on CreatedDate and not caring about TenantID).

No, you would not include FKs such as OutletID to the PK. The PK needs to uniquely identify the row, and adding in FKs would not help with that. In fact, it would increase chances for duplicated data, assuming that OrderID was unique for each TenantID, as opposed to unique per each OutletID within each TenantID.

Also, it is not necessary to add OutletID to the PK in order to ensure that Outlets from Tenant A don't get mixed up with Tenant B. Since all user-data tables will have TenantID in the PK, that means TenantID will also be in the FKs. For example, the Outlet table has a PK of (TenantID, OutletID), and the Order table has a PK of (TenantID, OrderID) and an FK of (TenantID, OutletID) which references the PK on the Outlet table. Properly defined FKs will prevent Tenant data from getting intermixed.

-

Does the composite key order matter?

Well, here is where it gets fun. There is some debate as to which field should come first. The "typical" rule for designing good indexes is to pick the most selective field to be the leading field. TenantID, by its very nature, will not be the most selective field; the ID field is the most selective field. Here are some thoughts:

-
ID first: This is the most selective (i.e. most unique) field. But by being an auto increment field (or random if still using GUIDs), each customer's data is spread out throughout each table. This means that there are times when a customer needs 100 rows, and that requires almost 100 data pages read from disk (not fast) into the Buffer Pool (taking up more space than 10 data pages). It also increases contention on the data pages since it will be more frequent that multiple customers will need to update the same data page.

However, you typically do not run into as many parameter sniffing / bad cached plan issues as much since the statistics across the different ID values is fairly consistent. You might not get the most optimal plans, but you will be less likely to get horrible ones. This method essentially sacrifices performance (slightly) across all customers to gain the benefit of less frequent problems.

-
TenantID first: This is very much not selective at all. There might be very little variation across 1 million rows if you only have 100 TenantIDs. But the statistics for these queries is more accurate since SQL Server will know that a query for Tenant A will pull back 500,000 rows but that same query for Tenant B is only 50 rows. This is where the main pain-point is. This method greatly increases the chances of having parameter sniffing issues where the first run of a Stored Procedure is for Tenant A and acts appropriately based on the Query Optimizer seeing those statistics and knowing it needs to be efficient getting 500k rows. But when Tenant B, with only 50 rows, runs, that execution plan is no longer appropriate, and in fact

Context

StackExchange Database Administrators Q#98118, answer score: 49

Revisions (0)

No revisions yet.