principlesqlMinor
Lots of Indexes - MySQL vs MongoDB - Migrating
Viewed 0 times
mongodbindexeslotsmysqlmigrating
Problem
So, I've never used MongoDB, I just read a lot about it and I think it's going to be good for my project. Also, I do not have lots of experience with MySQL and to be even more honest I have no clue of what I'm about to ask.
Scenario:
MySQL table
All the
I'm planning to use mongodb because my project needs to be public ASAP and we may add several new profile info and other things and I do not want to do alter table and migrate from tools with lots of rows.
With that said, we may have to add a few more
Questions:
-
It's better to keep the
-
May MongoDB be faster if I just migrate the whole project but I keep the structure like that? I mean like having
-
I'm worried about how many resources MongoDB could use for a 'table/collection' with that many indexes, I want to keep disk space and ram at minimum use. Is there a critical difference between MySQL and MongoDB?
PS: SSDs are going to be used in the system.
PS II: All tables are just planned, nothing is written yet. I'm a bit of plan very ahead person so p
Scenario:
MySQL table
profile:- id = [pk, auto_increment, smallint]
- user_id = [pk, fk, varchar]
- category_id = [pk, fk, smallint]
- role_id = [pk, fk, tinyint]
- country_id = [pk, fk, smallint]
- state_id = [pk, fk, smallint]
- legal_document = [pk, varchar, ?unique]
- name = [pk, varchar, ?unique]
- type = [pk, boolean]
- last_activity = [pk, date]
All the
fk you see there are MySQL tables of course. Then I'm thinking to use MongoDB to store the profile info, profile_info collection should contain documents like:{
'_id' : 1 (profile_id),
'address': 'Some street in some state of some country :P',
'phone': [5555555, 5555555],
'email': 'example@example.com',
etc...
}I'm planning to use mongodb because my project needs to be public ASAP and we may add several new profile info and other things and I do not want to do alter table and migrate from tools with lots of rows.
With that said, we may have to add a few more
pks to the MySQL table, so I was thinking of migrating all the project to MongoDB and I do not know if that is a good move.Questions:
-
It's better to keep the
pks in MySQL and trivial info in MongoDB, or would it be fine if I move all to MongoDB?-
May MongoDB be faster if I just migrate the whole project but I keep the structure like that? I mean like having
profile and profile_info collections instead of just profile.-
I'm worried about how many resources MongoDB could use for a 'table/collection' with that many indexes, I want to keep disk space and ram at minimum use. Is there a critical difference between MySQL and MongoDB?
PS: SSDs are going to be used in the system.
PS II: All tables are just planned, nothing is written yet. I'm a bit of plan very ahead person so p
Solution
Given your description I strongly suggest against using MongoDB. Not because it would necessarily a bad choice (although I believe it is in your case for reasons other than pure technical ones).
Here are the points that caught my eye.
Data modeling
Trying to use MongoDB with relational data model with no adaptations almost always leads to tears and misery except for the most trivial use cases. And that is the better end of the story. The worse end is loosing money, potentially big time.
The reason for this is that with SQL, you identify your entities and their attributes and relations and then bang your head against the wall for a few hours to get your upper left above and beyond JOINS right to get your questions derived from your use cases answered. All while avoiding data redundancy like the devil holy water.
Data modeling with MongoDB works different. You identify the use cases and the questions derived from them and model your data in a way so that those questions can be answered in the most efficient way.
Since this is a bit abstract, let me give you an
Example
Let us assume you have a web application called "chirper", with users doing chirps. With SQL you would now model your data, coming up with something like a user table and a chirp table.
The first use case you encounter is that you want the latest 10 chirps to be displayed on the applications home page, together with the username of the chirper. With SQL it is easy enough, you do a join on chirps and users, sort the result descending by data and limit to 10 records.
With MongoDB, you'd have a careful look at what you need. For various reasons you do not want to embed the chirps into the user collection. But since you do need to relate the chirps to a user anyway, you decide to do so by username and come up with a "schema" for your chirps collection like this:
and, miraculously enough, all you have to do is to do a
which gives you the same result without a JOINS at the expense of relatively cheap disk space.
Resource limitation
I want to keep disk space and ram at minimum use
MongoDB is a lot, but surely not resource friendly. It was never meant to be a replacement for MySQL – which by the way was specifically designed to be a relatively lightweight general purpose RDBMS. I strongly advice against running MongoDB and anything else on the same server for production purposes. The reasons are manifold, but here are the most important ones:
Leaving out the petty details for the moment, this would mean that during situations where you do not want it the most (your application is taking off and has a lot of users), MongoDB would battle with the other parts of the application over resources.
Do not get me wrong: MongoDB is not a resource eating monster per se. But if it does what it is intended to do, namely dealing with huge amounts of data and data changes, you do not want to have the parts of your application battle for resources.
That being said: There are ways to limit MongoDBs resource allocation and make sure those limits are obeyed. Probably the most well known as of today is Docker. You have to decide wether it is worth it to run MongoDB in a Docker container.
As for the data file size: There is transparent compression available for the default wiredTiger storage engine. You can choose from either snappy(default) or the better known zlib compression. Both come at the expense of a higher CPU utilization, as mentioned above.
Time to market
Forgive me, but you demonstrate a lack of basic knowledge about MongoDB. Since you are not sure what to do, the best you can do is to research every step carefully, make a decision, rinse and repeat. And I am not even taking into account that you would need to remodel your data and most likely make massive changes to your applications persistence layer. If I were you and wanted/needed a fast time to market, I'd go with what I have as of now. In case problems occur during production, I'd get myself a consultant specializing on NoSQL to find out wether one of the various NoSQL databases suits your needs, identify said DBMS (which may or may not be MongoDB), get myself a specialist for that and do the migration only then. With a specialist on your side.
Administration
A mistake commonly made by people is that they think it is as easy to manage a MongoDB deployment as it is to getting it to work. It
Here are the points that caught my eye.
Data modeling
Trying to use MongoDB with relational data model with no adaptations almost always leads to tears and misery except for the most trivial use cases. And that is the better end of the story. The worse end is loosing money, potentially big time.
The reason for this is that with SQL, you identify your entities and their attributes and relations and then bang your head against the wall for a few hours to get your upper left above and beyond JOINS right to get your questions derived from your use cases answered. All while avoiding data redundancy like the devil holy water.
Data modeling with MongoDB works different. You identify the use cases and the questions derived from them and model your data in a way so that those questions can be answered in the most efficient way.
Since this is a bit abstract, let me give you an
Example
Let us assume you have a web application called "chirper", with users doing chirps. With SQL you would now model your data, coming up with something like a user table and a chirp table.
The first use case you encounter is that you want the latest 10 chirps to be displayed on the applications home page, together with the username of the chirper. With SQL it is easy enough, you do a join on chirps and users, sort the result descending by data and limit to 10 records.
With MongoDB, you'd have a careful look at what you need. For various reasons you do not want to embed the chirps into the user collection. But since you do need to relate the chirps to a user anyway, you decide to do so by username and come up with a "schema" for your chirps collection like this:
{
_id: ObjectId("570b87a56931b8f21a8bf25c"),
user: "jdoe",
date: ISODate("2016-04-11T11:17:08.415Z"),
text: "Chirp!!!"
}and, miraculously enough, all you have to do is to do a
db.chirps.find({}).sort({date:-1}).limit(10)which gives you the same result without a JOINS at the expense of relatively cheap disk space.
Resource limitation
I want to keep disk space and ram at minimum use
MongoDB is a lot, but surely not resource friendly. It was never meant to be a replacement for MySQL – which by the way was specifically designed to be a relatively lightweight general purpose RDBMS. I strongly advice against running MongoDB and anything else on the same server for production purposes. The reasons are manifold, but here are the most important ones:
- MongoDB will take up to 85% - 90% of your available, physical RAM. This is because the working set (the indices and a subset of the data) are kept in RAM.
- Depending on the storage engine and its configuration you use for MongoDB, even CPU utilization can be considerable.
- A heavily loaded MongoDB needs a lot of disk IO. I have seen systems with SSDs in a RAID0 in which the IO rate was at its limit.
Leaving out the petty details for the moment, this would mean that during situations where you do not want it the most (your application is taking off and has a lot of users), MongoDB would battle with the other parts of the application over resources.
Do not get me wrong: MongoDB is not a resource eating monster per se. But if it does what it is intended to do, namely dealing with huge amounts of data and data changes, you do not want to have the parts of your application battle for resources.
That being said: There are ways to limit MongoDBs resource allocation and make sure those limits are obeyed. Probably the most well known as of today is Docker. You have to decide wether it is worth it to run MongoDB in a Docker container.
As for the data file size: There is transparent compression available for the default wiredTiger storage engine. You can choose from either snappy(default) or the better known zlib compression. Both come at the expense of a higher CPU utilization, as mentioned above.
Time to market
Forgive me, but you demonstrate a lack of basic knowledge about MongoDB. Since you are not sure what to do, the best you can do is to research every step carefully, make a decision, rinse and repeat. And I am not even taking into account that you would need to remodel your data and most likely make massive changes to your applications persistence layer. If I were you and wanted/needed a fast time to market, I'd go with what I have as of now. In case problems occur during production, I'd get myself a consultant specializing on NoSQL to find out wether one of the various NoSQL databases suits your needs, identify said DBMS (which may or may not be MongoDB), get myself a specialist for that and do the migration only then. With a specialist on your side.
Administration
A mistake commonly made by people is that they think it is as easy to manage a MongoDB deployment as it is to getting it to work. It
Code Snippets
{
_id: ObjectId("570b87a56931b8f21a8bf25c"),
user: "jdoe",
date: ISODate("2016-04-11T11:17:08.415Z"),
text: "Chirp!!!"
}db.chirps.find({}).sort({date:-1}).limit(10)Context
StackExchange Database Administrators Q#134898, answer score: 7
Revisions (0)
No revisions yet.