patternMinor
Is it ok to have repeated values in some of the columns and similar columns in two separate tables?
Viewed 0 times
tablestherepeatedcolumnsandseparatetwosomevaluessimilar
Problem
I'm trying to create database in which I could store thousands of job postings. I put here 3 sample records of these job postings so that you could better understand how everything looks like:
Every row is a separate job posting.
Now, based on above data, I've created simple relational database schema filled with this data:
I have two questions here:
-
-
Repeated values in cells occur almost in every column. For example when you take a look at
Every row is a separate job posting.
Now, based on above data, I've created simple relational database schema filled with this data:
I have two questions here:
-
Job posting table contains column experience and Skills table contains column named level. These two columns have different names but actually contains same repeated values: trainee, junior, mid, senior, expert. Is this ok to have virtually same columns in two different tables?-
Repeated values in cells occur almost in every column. For example when you take a look at
title column in Job posting table you'll see that there are two cells with Python Developer value. The same story happens in skill name column in Skills table where you can find two Python values. I have thousands of job postings which means that there will be thousands repetitions in columns. Should I do something about this?Solution
What is the point of creating separate tables for Job title and Skills?
To be honest I expected that you might propose this kind of solutions i.e creating junction table and I'm not happy with it mostly because I don't get it what I would gain with it.
Without repeating the same answer a third time, I figured I'd address your main question here that for some reason no one else wanted to go into detail about.
What the other answers mentioned by creating additional tables to hold the unique list of repeated entity values, and referencing those values by their immutable keys in the other tables, such as
An integral benefit of data normalization is improved data integrity. Let's say one day the data needs to be updated for the
In a normalized design, where the value of
Another secondary benefit of data normalization, is improved performance for such data manipulation changes, since you're only changing a single row as opposed to many rows. And that single row lives in a potentially less transactional table, therefore eliminating what would've caused blocking on the other table.
To be honest I expected that you might propose this kind of solutions i.e creating junction table and I'm not happy with it mostly because I don't get it what I would gain with it.
Without repeating the same answer a third time, I figured I'd address your main question here that for some reason no one else wanted to go into detail about.
What the other answers mentioned by creating additional tables to hold the unique list of repeated entity values, and referencing those values by their immutable keys in the other tables, such as
Skill Names, is called normalization. It's like a type of refactoring for your data. The reason you would do it is to eliminate the repeated values in your data, which are liable to change.An integral benefit of data normalization is improved data integrity. Let's say one day the data needs to be updated for the
Skill Name, from Rest API to RESTful API. With your current design you would need to update multiple rows to ensure you correctly made the aforementioned data change. Either through user error, or programmatic error (if for some reason the change was made in batches in a non-transactional loop, for example), could leave your data in an intermittent incorrect state.In a normalized design, where the value of
Rest API only lives in a single row in a unique table, then you would only need to update that one row, guaranteeing the change was made correctly. Changing a single row is implicitly automic, meaning all or nothing, it's not possible to end up with an intermittent state of the data. It is an improved design for data integrity.Another secondary benefit of data normalization, is improved performance for such data manipulation changes, since you're only changing a single row as opposed to many rows. And that single row lives in a potentially less transactional table, therefore eliminating what would've caused blocking on the other table.
Context
StackExchange Database Administrators Q#320500, answer score: 3
Revisions (0)
No revisions yet.