patternsqlModerate
SQL database for a social network
Viewed 0 times
socialsqldatabasefornetwork
Problem
I'm trying to create a mini social network mobile application kind of like Instagram and I'm not sure about my database organisation, it feels like something is missing, especially with the
```
CREATE TABLE Users(
'user_id' int(20) NOT NULL AUTO_INCREMENT,
'email' VARCHAR(255) NOT NULL,
'username' VARCHAR(25) NOT NULL,
'password' VARCHAR(255) NOT NULL,
'first_name' VARCHAR(255) NOT NULL,
'last_name' VARCHAR(255) NOT NULL,
'gender' ENUM('M', 'F'),
'city' VARCHAR(255),
'state' VARCHAR(255),
'country' VARCHAR(255),
'profile_picture_url' VARCHAR(255),
'birth_date' VARCHAR(255),
'date_created' DATE NOT NULL,
'date_updated' DATE,
'active' ENUM('yes','no') NOT NULL DEFAULT 'no',
'activation_key' VARCHAR(255) NOT NULL,
PRIMARY KEY ('user_id'),
UNIQUE KEY 'email' ('email'),
UNIQUE KEY 'username' ('username')
);
CREATE TABLE Posts(
'post_id' int(20) NOT NULL AUTO_INCREMENT,
'user_id' int(20) NOT NULL
'caption' VARCHAR(255),
'latitude' FLOAT NOT NULL,
'longitude' FLOAT NOT NULL,
'type' ENUM('image', 'video'),
'post_url' VARCHAR(255) NOT NULL,
'date_created' DATE NOT NULL,
'date_updated' DATE,
PRIMARY KEY ('post_id'),
FOREIGN KEY ('user_id') REFERENCES Users('user_id')
);
CREATE TABLE Followings(
'user_id' INT NOT NULL,
'following_id' INT NOT NULL,
'date_created' DATE NOT NULL,
PRIMARY KEY ('user_id', 'following_id'),
UNIQUE INDEX ('following_id', 'user_id')
FOREIGN KEY ('user_id') REFERENCES Users('user_id')
FOREIGN KEY ('following_id') REFERENCES Users('user_id')
);
CREATE TABLE Comments(
'comment_id' INT(20) NOT NULL AUTO_INCREMENT,
'post_id' INT(20) NOT NULL,
'user_id' INT(20) NOT NULL,
'content' TEXT NOT NULL,
'date_created' DATE NOT NULL,
'date_updated' DATE,
PRIMARY KEY ('comment_id'),
FOREIGN KEY ('post_id') REFERENCES Posts('post_id')
FOREIGN KEY ('user_id') REFERENCES Users('user_id')
);
CREATE TABLE Messages(
'message_id' INT(20) NOT NULL AUTO_INCREMENT,
'user_id_from' INT(20) NOT NULL,
'user_id_to' INT(20) NOT NULL,
'content' text NOT NULL,
'd
Posts table.```
CREATE TABLE Users(
'user_id' int(20) NOT NULL AUTO_INCREMENT,
'email' VARCHAR(255) NOT NULL,
'username' VARCHAR(25) NOT NULL,
'password' VARCHAR(255) NOT NULL,
'first_name' VARCHAR(255) NOT NULL,
'last_name' VARCHAR(255) NOT NULL,
'gender' ENUM('M', 'F'),
'city' VARCHAR(255),
'state' VARCHAR(255),
'country' VARCHAR(255),
'profile_picture_url' VARCHAR(255),
'birth_date' VARCHAR(255),
'date_created' DATE NOT NULL,
'date_updated' DATE,
'active' ENUM('yes','no') NOT NULL DEFAULT 'no',
'activation_key' VARCHAR(255) NOT NULL,
PRIMARY KEY ('user_id'),
UNIQUE KEY 'email' ('email'),
UNIQUE KEY 'username' ('username')
);
CREATE TABLE Posts(
'post_id' int(20) NOT NULL AUTO_INCREMENT,
'user_id' int(20) NOT NULL
'caption' VARCHAR(255),
'latitude' FLOAT NOT NULL,
'longitude' FLOAT NOT NULL,
'type' ENUM('image', 'video'),
'post_url' VARCHAR(255) NOT NULL,
'date_created' DATE NOT NULL,
'date_updated' DATE,
PRIMARY KEY ('post_id'),
FOREIGN KEY ('user_id') REFERENCES Users('user_id')
);
CREATE TABLE Followings(
'user_id' INT NOT NULL,
'following_id' INT NOT NULL,
'date_created' DATE NOT NULL,
PRIMARY KEY ('user_id', 'following_id'),
UNIQUE INDEX ('following_id', 'user_id')
FOREIGN KEY ('user_id') REFERENCES Users('user_id')
FOREIGN KEY ('following_id') REFERENCES Users('user_id')
);
CREATE TABLE Comments(
'comment_id' INT(20) NOT NULL AUTO_INCREMENT,
'post_id' INT(20) NOT NULL,
'user_id' INT(20) NOT NULL,
'content' TEXT NOT NULL,
'date_created' DATE NOT NULL,
'date_updated' DATE,
PRIMARY KEY ('comment_id'),
FOREIGN KEY ('post_id') REFERENCES Posts('post_id')
FOREIGN KEY ('user_id') REFERENCES Users('user_id')
);
CREATE TABLE Messages(
'message_id' INT(20) NOT NULL AUTO_INCREMENT,
'user_id_from' INT(20) NOT NULL,
'user_id_to' INT(20) NOT NULL,
'content' text NOT NULL,
'd
Solution
Generally speaking, the schema looks good and well organized, however a few columns could use some refactoring. Here is what I can point to.
Users table
This seems a bit small for a user name. While I'm pretty sure most users will choose names below 25 characters, I would expand it a bit just for the sake of safety. I think 50 or so will do.
This smells at first sight. Maybe I'm wrong, but this sounds as a plain-text password, which is utterly wrong. Column size should match the output size of the used hash function, plus the size of the salt and hash parameters (or split those into separate columns). And don't even think of SHA1 or MD5 :P Use PBKDF2, bcrypt or scrypt for this. Look here for a more detailed discussion of password hashing.
This design is very specific for most occidental names of people, but wrong in the general case. People's names in many part of the world don't fit into first/last name, and for the most part a single, longer column will do. Read this article for more details about this.
Additionally, do you really need the real name of the user? Doesn't the username suffice? Or is the real name to be displayed somewhere?
All in all, I would put it as
In many case, and specially in a social network environment (and more if you will allow search by gender), that restriction is not enough, as human gender identity goes far beyond male/female. Just for the record, New York recognizes 31 and Facebook 71, so I think your DB should at least take that into consideration.
What I would do is separate the possible genders into its own table (that you can fill separately) and put a foreign key in users:
And in
Generally, this is a bad practice, as it would lead to repeated and inconsistent data, and difficult searches by city/country (as anyone can write it the way he wants). A better way is to normalize this into separate tables for cities/states/countries, and leave a reference to the city in the users table (and in turn, a city implies a state and a country):
Those should be populated by you, then the users just select one of the options, and that allows proper searches. In
As stated in the comments, this should be of type
An enum is not the most optimal data type here. It's better to use
Posts Table
Generally, the layout looks good, I would not split it into images/videos, at least for now. This is because both of them store basically the very same data, with only the file type differing. Only consider such a split if each type requires different columns. Other than that, I have these few tips about this table:
First of all, I would add a comment column so that users can write a bit about the photo/video they just posted (in addition to the simple caption).
Is this really necessary? Why store the full URL when you can deduce it from other columns? Specifically, the id, or better, the caption, can be used to build unique URLs for each post without needing to generate a new piece of data.
Followings Table
This is redundant. The primary key is, by definition, unique. The unique key is therefore unnecessary and can be removed.
General comments
One thing I would do is to be explicit about nullability of each column, even if the default favors your particular case. Being explicit doesn't hurt, and it really helps readability, so I would suggest always stating each column as
You might consider the possibility of allowing users to like comments too, in addition to the
Users table
'username' VARCHAR(25) NOT NULL,This seems a bit small for a user name. While I'm pretty sure most users will choose names below 25 characters, I would expand it a bit just for the sake of safety. I think 50 or so will do.
'password' VARCHAR(255) NOT NULL,This smells at first sight. Maybe I'm wrong, but this sounds as a plain-text password, which is utterly wrong. Column size should match the output size of the used hash function, plus the size of the salt and hash parameters (or split those into separate columns). And don't even think of SHA1 or MD5 :P Use PBKDF2, bcrypt or scrypt for this. Look here for a more detailed discussion of password hashing.
'first_name' VARCHAR(255) NOT NULL,
'last_name' VARCHAR(255) NOT NULL,This design is very specific for most occidental names of people, but wrong in the general case. People's names in many part of the world don't fit into first/last name, and for the most part a single, longer column will do. Read this article for more details about this.
Additionally, do you really need the real name of the user? Doesn't the username suffice? Or is the real name to be displayed somewhere?
All in all, I would put it as
'real_name' VARCHAR(1000) NOT NULL,'gender' ENUM('M', 'F'),In many case, and specially in a social network environment (and more if you will allow search by gender), that restriction is not enough, as human gender identity goes far beyond male/female. Just for the record, New York recognizes 31 and Facebook 71, so I think your DB should at least take that into consideration.
What I would do is separate the possible genders into its own table (that you can fill separately) and put a foreign key in users:
CREATE TABLE Genders (
'gender_id' INT(20) NOT NULL AUTO_INCREMENT,
'name' VARCHAR(100) NOT NULL,
PRIMARY KEY ('gender_id'),
UNIQUE KEY ('name')
);And in
users:'gender_id' INT(20),
FOREIGN KEY ('gender_id') REFERENCES Genders('gender_id')'city' VARCHAR(255),
'state' VARCHAR(255),
'country' VARCHAR(255),Generally, this is a bad practice, as it would lead to repeated and inconsistent data, and difficult searches by city/country (as anyone can write it the way he wants). A better way is to normalize this into separate tables for cities/states/countries, and leave a reference to the city in the users table (and in turn, a city implies a state and a country):
CREATE TABLE Countries (
'country_id' INT(20) NOT NULL AUTO_INCREMENT,
'name' VARCHAR(100) NOT NULL,
PRIMARY KEY ('country_id'),
UNIQUE KEY ('name')
);
CREATE TABLE States (
'state_id' INT(20) NOT NULL AUTO_INCREMENT,
'name' VARCHAR(100) NOT NULL,
'country_id' INT(20) NOT NULL,
PRIMARY KEY ('state_id'),
UNIQUE KEY ('name','country_id')
);
CREATE TABLE Cities (
'city_id' INT(20) NOT NULL AUTO_INCREMENT,
'name' VARCHAR(100) NOT NULL,
'state_id' INT(20) NOT NULL,
PRIMARY KEY ('city_id'),
UNIQUE KEY ('name','state_id')
);Those should be populated by you, then the users just select one of the options, and that allows proper searches. In
users you only have a reference to city:'city_id' INT(20) NOT NULL,
FOREIGN KEY ('city_id') REFERENCES Cities('city_id')'birth_date' VARCHAR(255),As stated in the comments, this should be of type
DATE.'active' ENUM('yes','no') NOT NULL DEFAULT 'no',An enum is not the most optimal data type here. It's better to use
BIT instead.Posts Table
Generally, the layout looks good, I would not split it into images/videos, at least for now. This is because both of them store basically the very same data, with only the file type differing. Only consider such a split if each type requires different columns. Other than that, I have these few tips about this table:
First of all, I would add a comment column so that users can write a bit about the photo/video they just posted (in addition to the simple caption).
'comment' TEXT NULL'post_url' VARCHAR(255) NOT NULL,Is this really necessary? Why store the full URL when you can deduce it from other columns? Specifically, the id, or better, the caption, can be used to build unique URLs for each post without needing to generate a new piece of data.
Followings Table
PRIMARY KEY ('user_id', 'following_id'),
UNIQUE INDEX ('following_id', 'user_id')This is redundant. The primary key is, by definition, unique. The unique key is therefore unnecessary and can be removed.
General comments
One thing I would do is to be explicit about nullability of each column, even if the default favors your particular case. Being explicit doesn't hurt, and it really helps readability, so I would suggest always stating each column as
NULL/NOT NULL.You might consider the possibility of allowing users to like comments too, in addition to the
Code Snippets
'username' VARCHAR(25) NOT NULL,'password' VARCHAR(255) NOT NULL,'first_name' VARCHAR(255) NOT NULL,
'last_name' VARCHAR(255) NOT NULL,'real_name' VARCHAR(1000) NOT NULL,'gender' ENUM('M', 'F'),Context
StackExchange Code Review Q#153710, answer score: 11
Revisions (0)
No revisions yet.