patternsqlMinor
What's the best way to diagram this "relation" looking for performance and|or disk space savings
Viewed 0 times
thisthespacewhatdisksavingswaydiagramlookingfor
Problem
I am working in a DB diagram and I have a table called
default (this because not all of the record will have a download
related)
Now, regarding performance, disk space savings, query savings and so on, how would you do that? What is your recommendation on this edge case?
Note: for the moment I am using MariaDB 10.1.x but this will be in a MySQL instance, probably 5.x or so I am not sure at all since I didn't got those details yet
cm_identifier_type. Each record of this table can have or not a downloadable file (saved a file path). I have identified two possibles ways to achieve this.- Place a column
download_fileasVARCHAR(250)and set toNULLby
default (this because not all of the record will have a download
related)
- Create a relation between
cm_identifier_typeand a new tablecm_download_file:
Now, regarding performance, disk space savings, query savings and so on, how would you do that? What is your recommendation on this edge case?
Note: for the moment I am using MariaDB 10.1.x but this will be in a MySQL instance, probably 5.x or so I am not sure at all since I didn't got those details yet
Solution
If you have a lot of specific statistics about how often your optional columns will be null and how often the non-null columns need to be read (as opposed to your core, mandatory columns) then you could calculate space savings (or not) and at the very least you could derive some performance experiments to test each approach.
There is no rule of thumb for whether it is "better" to segregate optional columns into a separate table.
"Better" is a subjective term. What is of value? Disk space, CPU cycles, query response times, simplicity of code? You can't consider the relative merits of one approach over another without first considering what it is that you're trying to optimize.
There are a lot of reasons why you might want to move optional columns to a separate 1:1 related sub-table. See my answer to this question for more discussion of these reasons.
In your case, since you're concerned about space, you need to keep in mind a few things about how data is physically stored:
There are a lot of competing influences on the matter of which takes less space and which performs better. You need to consider:
Another thing to consider, perhaps even the most important thing to consider, is whether you are overthinking the issue by trying to pre-optimize. Disk is pretty cheap. CPU is pretty cheap. Programmers are expensive. Unless you need to be concerned about massive scale, maybe the best answer is the one that leaves you with the simplest (least buggy, easiest to maintain) code.
There is no rule of thumb for whether it is "better" to segregate optional columns into a separate table.
"Better" is a subjective term. What is of value? Disk space, CPU cycles, query response times, simplicity of code? You can't consider the relative merits of one approach over another without first considering what it is that you're trying to optimize.
There are a lot of reasons why you might want to move optional columns to a separate 1:1 related sub-table. See my answer to this question for more discussion of these reasons.
In your case, since you're concerned about space, you need to keep in mind a few things about how data is physically stored:
- Variable size strings don't take up much space if they are empty.
- Every nullable column needs a bit to indicate whether it is null or not.
- A separate table with optional columns needs to store the primary key of the main table again (as a foreign key) and there may be a separate index on this column as well.
There are a lot of competing influences on the matter of which takes less space and which performs better. You need to consider:
- How often is the optional data going to be missing? If it's really rare then maybe you will save space by segregating it out.
- How often do you need to read the optional data? If you read it every time (whether it's there or not) then maybe joining for it all the time is less efficient.
- On the other hand, if it isn't there very often and if you hardly ever read it, even when it is, then maybe it's more efficient to read shorter records containing only the mandatory columns?
Another thing to consider, perhaps even the most important thing to consider, is whether you are overthinking the issue by trying to pre-optimize. Disk is pretty cheap. CPU is pretty cheap. Programmers are expensive. Unless you need to be concerned about massive scale, maybe the best answer is the one that leaves you with the simplest (least buggy, easiest to maintain) code.
Context
StackExchange Database Administrators Q#124936, answer score: 5
Revisions (0)
No revisions yet.