HiveBrain v1.2.0
Get Started
← Back to all entries
snippetModerate

How to handle table design with variable columns

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
columnshandlewithdesignhowvariabletable

Problem

I have a table design scenario and as a non-DBA type, would like opinions on which is more scaleable.

Say you are asked to record information on houses for a metro area, starting with a small neighborhood (200 houses) but eventually growing to 5000000+ houses.

You are required to store base information: ID# (A unique lot # we can use as a unique index), Addr, City, State, Zip. Fine, simple table will handle it.

But each year, you are going to be asked to record extra information about all of the houses - and WHAT information will change each year. So, for example, first year, you are asked to record the owners last name and square footage. The second year, you are asked to keep the last name, but dump the square footage and instead begin collecting the owners first names.

Lastly - each year the # of extra columns will change. Might start with 2 extra columns, then go to 6 next year, then back down to 2.

So one table approach is to try to add the custom information as columns in the house tables so there is only one table.

But I have a situation where someone laid out the tables for this as:

"House Table" columns: ID, Addr, City, State, Zip - with one row per house

ID   Addr              City     State  Zip 
-------------------------------------------
1    10 Maple Street   Boston      MA  11203

2    144 South Street  Chelmsford  MA  11304

3    1 Main Avenue     Lowell      MA  11280


"Custom Info Table" columns: ID, Name, Value - with table looking like:

ID   Name             Value

1    Last Name        Smith

2    Last Name        Harrison

3    Last Name        Markey

1    Square Footage   1200

2    Square Footage   1930

3    Square Footage


So there are multiple of rows for each individual house record. Each year when the optional information required changes, this table is literally rebuilt, so next year it might look like:

```
1 Last Name Smith

2 Last Name Harrison

3 Last Name Markey

1 First Name John

Solution

You have pretty much 4 choices:

NoSQL - definition Every record is stored as a set of Key/Value pairs. It is very flexible and fast. Not all of the report writers out there support this style of storage. There are many example database implementations of NoSQL. The one that seems to be most popular right now, is MongoDB.

EAV - definition This is where you turn either the whole table or a portion (in another table) on its side. This is a good choice if you already have a relational database in-house that you can't move away from easily. The custom info table example you gave is a good example of an EAV table.

Standard tables with XML columns - Think of this as NoSQL meets relational tables. The data stored in an XML column can be any format that XML supports, including multiple correlated sub-data. For the columns that you know are going to be "regular" columns, they can be built as the appropriate type of column to store the data (LastName, Address, City, State, etc.).

Standard tables with lots of extra columns - You have a relational database, you can't use either XML or EAV, and NoSQL is not an option. Add lots of extra columns of each type. I would guess 30 or more varchar, 30 or more integer, 15 or more numerics. And once you use a column for a value, don't re-use it. And don't delete the column either.

Out of all of these solutions, my own opinion is that you will find either the NoSQL or the EAV approach to be the most successful with the least amount of refactoring your code and your schema.

You will have a situation where you collect data one year, not the next, and then collect it again afterward. Trying to get the older data updated with the correct information is problematic and expensive. Storage is neither.

Context

StackExchange Database Administrators Q#58036, answer score: 18

Revisions (0)

No revisions yet.