HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Does 'Cities' deserve a separate table?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
separatecitiesdeservedoestable

Problem

I have the following Customers table:

customer_id - int 
company_name - nvarchar
street - nvarchar
city - nvarchar
comments - nvarchar


The app will only be used in part of one small country (something like 30 cities). A friend told me I should seperate 'city' into a different table 'Cities' and use only city_id in customers table.

Personally I didn't see much of a benefit from it (except for saving some space on Customers table which seems insignificant to me in this case for the cost of creating another table).

He also mentioned that because I have duplicate columns - city: foo, city: bar, city: foo. (few customers in the same city) this is not considered normalized, is this true?

Who's right? Any enlightenment on the issue?

Solution

When you get into it -- really get into it -- storing componentized address data is an extremely complicated problem because of all the disparate and varied systems in use globally.

I think whatever you develop needs to be balanced between flexibility, and storing only what your business needs to store.

The biggest piece of the puzzle here is to move all address-related fields out of the Customers table -- addresses are entities unto themselves.

The space cost may not be relevant in a very small system (maybe), but this is more about a technical debt issue. If you need to start adding more address-related fields, you would have to keep adding more and more of them to the Customers table. Sooner or later, you'll realize that this is an inflexible design -- if you need to use multiple addresses for a given customer (billing & shipping addresses being the classic example), now you're in a world of hurt without normalizing, as you can't reuse the existing structure to store the required data.

At an absolute minimum, create a new table Addresses, and then reference address_id from Customers. If you want to go the multiple address route eventually, even doing just this step (as opposed to sticking with the current design) will save a massive headache later.

The address line could go in either the Addresses table directly for simplicity, or in a separate Address_Lines table to handle multiple lines. (The latter is usually preferred.)

After that, a general minimum for being able to slice and dice your data in a meaningful way is to construct normalized Countries, Regions (aka provinces/territories/etc.), and Cities tables, with only the latter appearing as a field in the Addresses table. This lets you ask business questions like "how many products did we sell in city X?" and "how many products did we sell in region Y?". (Note: depending on where you operate, what data you have, and how the data will be sliced, you may require a 4th table in there between Regions and Cities.)

If you need to get more granular ("how many products did we sell to customers on street X?"), then you'll have to start componentizing the address lines themselves, which is the really difficult part. Usually, though, a business won't ask this kind of question. Given that I don't even see a postal code field, I'm guessing this is not something you care about.

Context

StackExchange Database Administrators Q#22726, answer score: 6

Revisions (0)

No revisions yet.