HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlModerate

Mysql int vs varchar as primary key (InnoDB Storage Engine?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
engineprimaryinnodbstoragevarcharmysqlintkey

Problem

I am build a web application (project management system) and I have been wondering about this when it come to performance.

I have an Issues table an inside it there are 12 foreign keys linking to various other tables. of those, 8 of them I would need to join to get the title field from the other tables in order for the record to make any sense in a web application but then means doing 8 joins which seems really excessive especially since I am only pulling in 1 field for each of those joins.

Now I have also been told to use a auto incrementing primary key (unless sharding is a concerns in which case I should use a GUID) for permanence reasons but how bad is it to use a varchar (max length 32) performance wise? I mean most of these table are probably not going to have at many records (most of them should be under 20). Also if I use the title as the primary key, I won't have to do joins 95% of the time so for 95% of the sql, I would even occur any performance hit (I think). The only downside I can think of is that I have is I will have higher disk space usage (but down a day is that really a big deal).

The reason I am use lookup tables for a lot of this stuff instead of enums is because I need all of these values to be configurable by the end user through the application itself.

What are the downsides of using a varchar as the primary key for a table not excepted to have many records?

UPDATE - Some Tests

So I decided to do some basic tests on this stuff. I have 100000 records and these are the base queries:

Base VARCHAR FK Query

SELECT i.id, i.key, i.title, i.reporterUserUsername, i.assignedUserUsername, i.projectTitle, 
i.ProjectComponentTitle, i.affectedProjectVersionTitle, i.originalFixedProjectVersionTitle, 
i.fixedProjectVersionTitle, i.durationEstimate, i.storyPoints, i.dueDate, 
i.issueSecurityLevelId, i.creatorUserUsername, i.createdTimestamp, 
i.updatedTimestamp, i.issueTypeId, i.issueStatusId
FROM ProjectManagement.Issues i


Base INT FK

Solution

I follow the following rules for primary keys:

a) Should not have any business meaning - they should be totally independent of the application you are developing, therefore I go for numeric auto generated integers. However if you need additional columns to be unique then create unique indexes to support that

b) Should perform in joins - joining to varchars vs integers is about 2x to 3x slower as the length of the primary key grows, so you want to have your keys as integers. Since all computer systems are binary, I suspect its coz the string is changed to binary then compared with the others which is very slow

c) Use the smallest data type possible - if you expect your table to have very few columns say 52 US states, then use the smallest type possible maybe a CHAR(2) for the 2 digit code, but I would still go for a tinyint (128) for the column vs a big int which can go up to 2billion

Also you will have a challenge with cascading your changes from the primary keys to the other tables if for example the project name changes (which is not uncommon)

Go for sequential auto incrementing integers for your primary keys and gain the inbuilt efficiencies that database systems provide with support for changes in the future

Context

StackExchange Database Administrators Q#15897, answer score: 14

Revisions (0)

No revisions yet.