HiveBrain v1.2.0
Get Started
← Back to all entries
patternMajor

Is using multiple foreign keys separated by commas wrong, and if so, why?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whyforeignseparatedkeysusingmultiplewrongandcommas

Problem

There are two tables: Deal and DealCategories. One deal can have many deal categories.

So the proper way should be to make a table called DealCategories with the following structure:

DealCategoryId (PK)
DealId (FK)
DealCategoryId (FK)


However, our outsource team stored the multiple categories in the Deal table this way:

DealId (PK)
DealCategory -- In here they store multiple deal ids separated by commas like this: 18,25,32.


I feel that what they did is wrong, but I don't know how to clearly explain why this is not right.

How should I explain to them that this is wrong? Or maybe I'm the one who's wrong and this is acceptable?

Solution

Yes it's a terrible idea.

Instead of going:

SELECT Deal.Name, DealCategory.Name
FROM Deal
  INNER JOIN
     DealCategories ON Deal.DealID = DealCategories.DealID
  INNER JOIN
     DealCategory ON DealCategories.DealCategoryID = DealCategory.DealCategoryID
WHERE Deal.DealID = 1234


You now have to go:

SELECT Deal.ID, Deal.Name, DealCategories
FROM Deal
WHERE Deal.DealID = 1234


Then you need to do stuff in your application code to split that comma list into individual numbers, then query the database seperately:

SELECT DealCategory.Name
FROM DealCategory
WHERE DealCategory.DealCategoryID IN (>)


This design antipattern stems from either a complete misunderstanding of relational modelling (You don't have to be scared of tables. Tables are your friends. Use them), or a bizarrely misguided belief it's faster to take a comma-separated list and split it in application code than it is to add a link table (it never is). The third option is that they're not confident/competent enough with SQL to be able to set up foreign keys, but if that's the case they shouldn't have anything to do with the design of a relational model.

SQL Antipatterns (Karwin, 2010) devotes an entire chapter to this antipattern (which he calls 'Jaywalking'), pages 15-23. Also, the author has posted on a similar question over at SO. Key points he notes (as applied to this example) are:

  • Querying for all deals in a specific category is rather complicated (the easiest way to solve that problem is a regular expression, but a regular expression is a problem in and of itself).



  • You can't enforce referential integrity without foreign key relationships. If you delete DealCategory nr. #26, you then, in your application code, have to go through each deal looking for references to category #26 and delete them. This is something that should be handled at the data layer, and having to handle it in your application is a very bad thing.



  • Aggregate queries (COUNT, SUM etc), again, vary from 'complicated' to 'almost impossible'. Ask your developers how they'd get you a list of all categories with a count of the number of deals in that category. With a proper design, that's four lines of SQL.



  • Updates become much more difficult (i.e. you have a deal that's in five categories, but you want to remove two and add three other ones). That's three lines of SQL with a proper design.



  • Eventually you'll run into VARCHAR list length limitations. Although if you have a comma-seperated list that's over 4000 characters, chances are parsing that monster is going to be slow as hell anyway.



  • Pulling a list out of the database, splitting it up, and then going back to the database for another query is intrinsically slower than one query.



TLDR: It's a fundamentally flawed design, it won't scale well, it introduces additional complexity to even the simplest queries, and right out-of-the-box it slows your application down.

Code Snippets

SELECT Deal.Name, DealCategory.Name
FROM Deal
  INNER JOIN
     DealCategories ON Deal.DealID = DealCategories.DealID
  INNER JOIN
     DealCategory ON DealCategories.DealCategoryID = DealCategory.DealCategoryID
WHERE Deal.DealID = 1234
SELECT Deal.ID, Deal.Name, DealCategories
FROM Deal
WHERE Deal.DealID = 1234
SELECT DealCategory.Name
FROM DealCategory
WHERE DealCategory.DealCategoryID IN (<<that list from before>>)

Context

StackExchange Database Administrators Q#17808, answer score: 49

Revisions (0)

No revisions yet.