patternsqlMinor
Chat schema for PostgreSQL and MongoDB
Viewed 0 times
postgresqlchatmongodbforandschema
Problem
I'm designing a schema for a very large chat app. I had problems with NoSQL databases such as MongoDB and Cassandra. Some part of the schema is relational i decided to use both PostgreSQL and MongoDB together.
Chats and members are stored in PostgreSQL and messages are stored in MongoDB.
There is a table called events in both databases. i just keep chat_ids and creation time of the events in PostgreSQL and the data of events in MongoDB.
These events include
Note that
Questions
What problems do i face if i use both together?
Is there anything to change?
Updated
Message schema
Chats and members are stored in PostgreSQL and messages are stored in MongoDB.
There is a table called events in both databases. i just keep chat_ids and creation time of the events in PostgreSQL and the data of events in MongoDB.
These events include
remove_message block_user , etc.Note that
longtext means jsonQuestions
What problems do i face if i use both together?
Is there anything to change?
Updated
Message schema
{
_id: ObjectId,
user_id: Number,
chat_id: Number,
random_id: String,
peer: {
type: [Number],
default: []
}
message: String,
type: {
type: String,
enum: ["service-message", "text", "image", "video", "voice"]
},
reply_to: {
message_id: ObjectId,
user_id: Number
},
forwarded_from: {
message_id: ObjectId,
user_id: Number,
chat_id: Number,
}
deleted_for: {
type: [Number],
default: []
},
author_signature: String,
restriction_reason: String,
number_of_forwards: {
type: Number,
default: 0
},
is_read: {
type: Boolean,
default: false,
},
is_deleted: {
type: Boolean,
default: false,
},
is_private: {
type: Boolean,
required: true
},
is_edited: {
type: Boolean,
default: false,
},
created_at: {
type: Number,
default: Date.now
},
updated_at: {
type: Number,
default: Date.now
}
}
Solution
This hopefully indirectly answers your question and points you in the right direction, it's too much to discuss in comments:
NoSQL Use Case History
MongoDB is not any more performant with writing data than any other mainstream relational database (PostgreSQL, SQL Server, etc). The same is true for reading data as well. NoSQL was not invented to be more performant with reading, writing, or querying data, rather it was originally created to be a tool to improve developer speed especially with a highly changing or variable schema. By not having to worry about maintaining an object structure both in the data layer in addition to the application layer, NoSQL databases allowed developers with specific development needs to release software changes faster when managing a highly changing schema.
From what I can see, your
Sharding, Not Even Once (or Eight Times)
Sharding may be easier (spoiler alert - it's usually not) but at the general tradeoff of eventual consistency (two instances of the same table being out of sync at a given point in time). When to use Sharding depends on a couple of factors generally based on a function of what one is willing to pay to provision their servers vs how much data they need to store, based on their specific use cases. But in general, modern relational database systems handle big data well when architected properly, and I've yet to find Sharding any more useful than vertical scaling.
In fact, the developers of MongoDB themselves state that you're usually better off vertically scaling (as is tradition in a RDBMS) than using Sharding because Sharding is typically more expensive, complex, less flexible, and harder to manage. Per "Myth 5: MongoDB is All About Sharding":
...a minimum sharded cluster is eight servers, and it goes up by at least three servers, with each shard added.
Sharded clusters also make your data harder to manage, and they add some limitations to the types of queries you can conduct. Sharding is useful if you need it, but it's often cheaper and easier to simply upgrade your hardware!
...If you're attempting to scale by sharding, you should at least consider whether hardware upgrades would be a more efficient alternative, first.
That article written by a MongoDB developer actually further discusses a similar thought process I have, which is many times Sharding is more expensive than vertically scaling. This is because typically the hardware bottleneck is not due to the entire provisioning of the server rather it's due to one or two specific pieces of hardware. So it's cheaper to vertically upgrade just those targeted pieces of hardware as opposed to redundantly buying more servers where you're paying for the other pieces of hardware that weren't the bottleneck. With a properly setup Sharded instance in MongoDB, this would be 8x over one would be redundantly paying for hardware. And then licensing costs become a consideration too, where one may end up needing to pay for more licensing for each server (both at the OS level and / or the database level) when they wouldn't've needed to had they vertically scaled instead.
My Experiences
All of that being said, I've worked with
Challenges of Multiple Database Systems
The main three challenges you'll find by using two different databases systems is:
-
Management of the data in two different systems, especially as your data grows: The management of a NoSQL database system requires different thought, planning, and attention than a relational database system.
-
Relating the data across systems in a meaningful way: Consuming the data by your apps or for reporting purposes becomes more complicated and dependent on two unrelated systems. Associating the correlating
-
Timing issues with querying the data as it's still being stored into two different database systems, or even just with synchronizing across Shards in MongoDB itself, with it's eventual consistency
NoSQL Use Case History
MongoDB is not any more performant with writing data than any other mainstream relational database (PostgreSQL, SQL Server, etc). The same is true for reading data as well. NoSQL was not invented to be more performant with reading, writing, or querying data, rather it was originally created to be a tool to improve developer speed especially with a highly changing or variable schema. By not having to worry about maintaining an object structure both in the data layer in addition to the application layer, NoSQL databases allowed developers with specific development needs to release software changes faster when managing a highly changing schema.
From what I can see, your
messages entity is structured but perhaps the content of the message text is not (an example of the text would be helpful). Regardless you can either parse the text into a relational model or store it as a single text field just the same as your entity currently is modeled as, if you wanted to use a relational database system for all of your data.Sharding, Not Even Once (or Eight Times)
Sharding may be easier (spoiler alert - it's usually not) but at the general tradeoff of eventual consistency (two instances of the same table being out of sync at a given point in time). When to use Sharding depends on a couple of factors generally based on a function of what one is willing to pay to provision their servers vs how much data they need to store, based on their specific use cases. But in general, modern relational database systems handle big data well when architected properly, and I've yet to find Sharding any more useful than vertical scaling.
In fact, the developers of MongoDB themselves state that you're usually better off vertically scaling (as is tradition in a RDBMS) than using Sharding because Sharding is typically more expensive, complex, less flexible, and harder to manage. Per "Myth 5: MongoDB is All About Sharding":
...a minimum sharded cluster is eight servers, and it goes up by at least three servers, with each shard added.
Sharded clusters also make your data harder to manage, and they add some limitations to the types of queries you can conduct. Sharding is useful if you need it, but it's often cheaper and easier to simply upgrade your hardware!
...If you're attempting to scale by sharding, you should at least consider whether hardware upgrades would be a more efficient alternative, first.
That article written by a MongoDB developer actually further discusses a similar thought process I have, which is many times Sharding is more expensive than vertically scaling. This is because typically the hardware bottleneck is not due to the entire provisioning of the server rather it's due to one or two specific pieces of hardware. So it's cheaper to vertically upgrade just those targeted pieces of hardware as opposed to redundantly buying more servers where you're paying for the other pieces of hardware that weren't the bottleneck. With a properly setup Sharded instance in MongoDB, this would be 8x over one would be redundantly paying for hardware. And then licensing costs become a consideration too, where one may end up needing to pay for more licensing for each server (both at the OS level and / or the database level) when they wouldn't've needed to had they vertically scaled instead.
My Experiences
All of that being said, I've worked with
Messages data (financial trading messages) in a large and heavily transactional relational database as well. Some of the tables were into the 10s of billions of rows and multi-terabytes big for a single table. We stored our raw unparsed Messages in one table as they came into our system, but wrote an application that also parsed the relevant information from the Messages into a normalized table.Challenges of Multiple Database Systems
The main three challenges you'll find by using two different databases systems is:
-
Management of the data in two different systems, especially as your data grows: The management of a NoSQL database system requires different thought, planning, and attention than a relational database system.
-
Relating the data across systems in a meaningful way: Consuming the data by your apps or for reporting purposes becomes more complicated and dependent on two unrelated systems. Associating the correlating
messages data to your other data such as users becomes more complicated, and actually probably less efficient since you need two different methodologies for retrieving data which will result in more steps associating the data. For example if you wanted to get the users.full_name for the messages they created.-
Timing issues with querying the data as it's still being stored into two different database systems, or even just with synchronizing across Shards in MongoDB itself, with it's eventual consistency
Context
StackExchange Database Administrators Q#306757, answer score: 3
Revisions (0)
No revisions yet.