patternMinor
Optimal design for a large database
Viewed 0 times
optimaldesigndatabaselargefor
Problem
Brand new to databases, using postgresql on linux. I've done a decent amount of reading on the issue and looked through similar questions on here, but just want to confirm the most efficient way of organizing everything.
So say for example we have entries for a million people, who each have a name, phone number, address, and between 20 to 50 "skills". Would the correct way be to have one table with the basic person properties along with a unique identifier, and then create a second table with two columns containing the person's ID and skill name?
Seems a little redundant to me, but the other option of having a variable number of skill columns in the person table seems even worse.
Any tips to maximize speed and minimize memory usage?
So say for example we have entries for a million people, who each have a name, phone number, address, and between 20 to 50 "skills". Would the correct way be to have one table with the basic person properties along with a unique identifier, and then create a second table with two columns containing the person's ID and skill name?
Seems a little redundant to me, but the other option of having a variable number of skill columns in the person table seems even worse.
Any tips to maximize speed and minimize memory usage?
Solution
This is a classic situation where you use a "joining table".
You will need three tables.
(note that table names are singular - this is a preference of
mine, not a rule).
This way you don't store the skill name, description &c. for every person who has that skill, but just once and then refer to it in the
That is the way I would do this.
You will need three tables.
Person
Skill
Person_Skill(note that table names are singular - this is a preference of
mine, not a rule).
Person
ID integer PRIMARY KEY
KEY name (and possibly phone?)
Skill
ID integer PRIMARY KEY
KEY name (maybe not even necessary if there are only 50 skills?)
Person_Skill
Person_ID integer FOREIGN KEY REFERENCES Person(ID)
Skill_ID integer FOREIGN KEY REFERENCES Skill(ID)
Unique Index on (Person_ID, Skill_ID) (to avoid duplicates)
KEY Skill_IDThis way you don't store the skill name, description &c. for every person who has that skill, but just once and then refer to it in the
Person_Skill table by its ID - this reduces space and memory requirements and is less error prone (you only store the Skill datum once - one of the core features of an RDBMS under Codd's rules).That is the way I would do this.
Code Snippets
Person
Skill
Person_SkillPerson
ID integer PRIMARY KEY
<other person details - name, address, phone &c.>
KEY name (and possibly phone?)
Skill
ID integer PRIMARY KEY
<other skill details - name, description, necessary qualification &c.>
KEY name (maybe not even necessary if there are only 50 skills?)
Person_Skill
Person_ID integer FOREIGN KEY REFERENCES Person(ID)
Skill_ID integer FOREIGN KEY REFERENCES Skill(ID)
Unique Index on (Person_ID, Skill_ID) (to avoid duplicates)
KEY Skill_IDContext
StackExchange Database Administrators Q#130481, answer score: 7
Revisions (0)
No revisions yet.