snippetModerate
How to handle table design with variable columns
Viewed 0 times
columnshandlewithdesignhowvariabletable
Problem
I have a table design scenario and as a non-DBA type, would like opinions on which is more scaleable.
Say you are asked to record information on houses for a metro area, starting with a small neighborhood (200 houses) but eventually growing to 5000000+ houses.
You are required to store base information: ID# (A unique lot # we can use as a unique index), Addr, City, State, Zip. Fine, simple table will handle it.
But each year, you are going to be asked to record extra information about all of the houses - and WHAT information will change each year. So, for example, first year, you are asked to record the owners last name and square footage. The second year, you are asked to keep the last name, but dump the square footage and instead begin collecting the owners first names.
Lastly - each year the # of extra columns will change. Might start with 2 extra columns, then go to 6 next year, then back down to 2.
So one table approach is to try to add the custom information as columns in the house tables so there is only one table.
But I have a situation where someone laid out the tables for this as:
"House Table" columns: ID, Addr, City, State, Zip - with one row per house
"Custom Info Table" columns: ID, Name, Value - with table looking like:
So there are multiple of rows for each individual house record. Each year when the optional information required changes, this table is literally rebuilt, so next year it might look like:
```
1 Last Name Smith
2 Last Name Harrison
3 Last Name Markey
1 First Name John
Say you are asked to record information on houses for a metro area, starting with a small neighborhood (200 houses) but eventually growing to 5000000+ houses.
You are required to store base information: ID# (A unique lot # we can use as a unique index), Addr, City, State, Zip. Fine, simple table will handle it.
But each year, you are going to be asked to record extra information about all of the houses - and WHAT information will change each year. So, for example, first year, you are asked to record the owners last name and square footage. The second year, you are asked to keep the last name, but dump the square footage and instead begin collecting the owners first names.
Lastly - each year the # of extra columns will change. Might start with 2 extra columns, then go to 6 next year, then back down to 2.
So one table approach is to try to add the custom information as columns in the house tables so there is only one table.
But I have a situation where someone laid out the tables for this as:
"House Table" columns: ID, Addr, City, State, Zip - with one row per house
ID Addr City State Zip
-------------------------------------------
1 10 Maple Street Boston MA 11203
2 144 South Street Chelmsford MA 11304
3 1 Main Avenue Lowell MA 11280"Custom Info Table" columns: ID, Name, Value - with table looking like:
ID Name Value
1 Last Name Smith
2 Last Name Harrison
3 Last Name Markey
1 Square Footage 1200
2 Square Footage 1930
3 Square FootageSo there are multiple of rows for each individual house record. Each year when the optional information required changes, this table is literally rebuilt, so next year it might look like:
```
1 Last Name Smith
2 Last Name Harrison
3 Last Name Markey
1 First Name John
Solution
You have pretty much 4 choices:
NoSQL - definition Every record is stored as a set of Key/Value pairs. It is very flexible and fast. Not all of the report writers out there support this style of storage. There are many example database implementations of NoSQL. The one that seems to be most popular right now, is MongoDB.
EAV - definition This is where you turn either the whole table or a portion (in another table) on its side. This is a good choice if you already have a relational database in-house that you can't move away from easily. The custom info table example you gave is a good example of an EAV table.
Standard tables with XML columns - Think of this as NoSQL meets relational tables. The data stored in an XML column can be any format that XML supports, including multiple correlated sub-data. For the columns that you know are going to be "regular" columns, they can be built as the appropriate type of column to store the data (LastName, Address, City, State, etc.).
Standard tables with lots of extra columns - You have a relational database, you can't use either XML or EAV, and NoSQL is not an option. Add lots of extra columns of each type. I would guess 30 or more varchar, 30 or more integer, 15 or more numerics. And once you use a column for a value, don't re-use it. And don't delete the column either.
Out of all of these solutions, my own opinion is that you will find either the NoSQL or the EAV approach to be the most successful with the least amount of refactoring your code and your schema.
You will have a situation where you collect data one year, not the next, and then collect it again afterward. Trying to get the older data updated with the correct information is problematic and expensive. Storage is neither.
NoSQL - definition Every record is stored as a set of Key/Value pairs. It is very flexible and fast. Not all of the report writers out there support this style of storage. There are many example database implementations of NoSQL. The one that seems to be most popular right now, is MongoDB.
EAV - definition This is where you turn either the whole table or a portion (in another table) on its side. This is a good choice if you already have a relational database in-house that you can't move away from easily. The custom info table example you gave is a good example of an EAV table.
Standard tables with XML columns - Think of this as NoSQL meets relational tables. The data stored in an XML column can be any format that XML supports, including multiple correlated sub-data. For the columns that you know are going to be "regular" columns, they can be built as the appropriate type of column to store the data (LastName, Address, City, State, etc.).
Standard tables with lots of extra columns - You have a relational database, you can't use either XML or EAV, and NoSQL is not an option. Add lots of extra columns of each type. I would guess 30 or more varchar, 30 or more integer, 15 or more numerics. And once you use a column for a value, don't re-use it. And don't delete the column either.
Out of all of these solutions, my own opinion is that you will find either the NoSQL or the EAV approach to be the most successful with the least amount of refactoring your code and your schema.
You will have a situation where you collect data one year, not the next, and then collect it again afterward. Trying to get the older data updated with the correct information is problematic and expensive. Storage is neither.
Context
StackExchange Database Administrators Q#58036, answer score: 18
Revisions (0)
No revisions yet.