patternMajor
Is using multiple foreign keys separated by commas wrong, and if so, why?
Viewed 0 times
whyforeignseparatedkeysusingmultiplewrongandcommas
Problem
There are two tables:
So the proper way should be to make a table called
However, our outsource team stored the multiple categories in the
I feel that what they did is wrong, but I don't know how to clearly explain why this is not right.
How should I explain to them that this is wrong? Or maybe I'm the one who's wrong and this is acceptable?
Deal and DealCategories. One deal can have many deal categories.So the proper way should be to make a table called
DealCategories with the following structure:DealCategoryId (PK)
DealId (FK)
DealCategoryId (FK)However, our outsource team stored the multiple categories in the
Deal table this way:DealId (PK)
DealCategory -- In here they store multiple deal ids separated by commas like this: 18,25,32.I feel that what they did is wrong, but I don't know how to clearly explain why this is not right.
How should I explain to them that this is wrong? Or maybe I'm the one who's wrong and this is acceptable?
Solution
Yes it's a terrible idea.
Instead of going:
You now have to go:
Then you need to do stuff in your application code to split that comma list into individual numbers, then query the database seperately:
This design antipattern stems from either a complete misunderstanding of relational modelling (You don't have to be scared of tables. Tables are your friends. Use them), or a bizarrely misguided belief it's faster to take a comma-separated list and split it in application code than it is to add a link table (it never is). The third option is that they're not confident/competent enough with SQL to be able to set up foreign keys, but if that's the case they shouldn't have anything to do with the design of a relational model.
SQL Antipatterns (Karwin, 2010) devotes an entire chapter to this antipattern (which he calls 'Jaywalking'), pages 15-23. Also, the author has posted on a similar question over at SO. Key points he notes (as applied to this example) are:
TLDR: It's a fundamentally flawed design, it won't scale well, it introduces additional complexity to even the simplest queries, and right out-of-the-box it slows your application down.
Instead of going:
SELECT Deal.Name, DealCategory.Name
FROM Deal
INNER JOIN
DealCategories ON Deal.DealID = DealCategories.DealID
INNER JOIN
DealCategory ON DealCategories.DealCategoryID = DealCategory.DealCategoryID
WHERE Deal.DealID = 1234You now have to go:
SELECT Deal.ID, Deal.Name, DealCategories
FROM Deal
WHERE Deal.DealID = 1234Then you need to do stuff in your application code to split that comma list into individual numbers, then query the database seperately:
SELECT DealCategory.Name
FROM DealCategory
WHERE DealCategory.DealCategoryID IN (>)This design antipattern stems from either a complete misunderstanding of relational modelling (You don't have to be scared of tables. Tables are your friends. Use them), or a bizarrely misguided belief it's faster to take a comma-separated list and split it in application code than it is to add a link table (it never is). The third option is that they're not confident/competent enough with SQL to be able to set up foreign keys, but if that's the case they shouldn't have anything to do with the design of a relational model.
SQL Antipatterns (Karwin, 2010) devotes an entire chapter to this antipattern (which he calls 'Jaywalking'), pages 15-23. Also, the author has posted on a similar question over at SO. Key points he notes (as applied to this example) are:
- Querying for all deals in a specific category is rather complicated (the easiest way to solve that problem is a regular expression, but a regular expression is a problem in and of itself).
- You can't enforce referential integrity without foreign key relationships. If you delete DealCategory nr. #26, you then, in your application code, have to go through each deal looking for references to category #26 and delete them. This is something that should be handled at the data layer, and having to handle it in your application is a very bad thing.
- Aggregate queries (
COUNT,SUMetc), again, vary from 'complicated' to 'almost impossible'. Ask your developers how they'd get you a list of all categories with a count of the number of deals in that category. With a proper design, that's four lines of SQL.
- Updates become much more difficult (i.e. you have a deal that's in five categories, but you want to remove two and add three other ones). That's three lines of SQL with a proper design.
- Eventually you'll run into
VARCHARlist length limitations. Although if you have a comma-seperated list that's over 4000 characters, chances are parsing that monster is going to be slow as hell anyway.
- Pulling a list out of the database, splitting it up, and then going back to the database for another query is intrinsically slower than one query.
TLDR: It's a fundamentally flawed design, it won't scale well, it introduces additional complexity to even the simplest queries, and right out-of-the-box it slows your application down.
Code Snippets
SELECT Deal.Name, DealCategory.Name
FROM Deal
INNER JOIN
DealCategories ON Deal.DealID = DealCategories.DealID
INNER JOIN
DealCategory ON DealCategories.DealCategoryID = DealCategory.DealCategoryID
WHERE Deal.DealID = 1234SELECT Deal.ID, Deal.Name, DealCategories
FROM Deal
WHERE Deal.DealID = 1234SELECT DealCategory.Name
FROM DealCategory
WHERE DealCategory.DealCategoryID IN (<<that list from before>>)Context
StackExchange Database Administrators Q#17808, answer score: 49
Revisions (0)
No revisions yet.