patternsqlModerate
Using MongoDB and PostgreSQL together
Viewed 0 times
postgresqlmongodbtogetherusingand
Problem
My current project is essentially a run of the mill document management system.
That said, there are some wrinkles (surprise, surprise). While some of
the wrinkles are fairly specific to the project, I believe there are
some general observations and questions that have come up which don't
have a canonical answer (that I could find, anyway) and that are
applicable to a wider problem domain. There's a lot here and I'm not
sure it's a good fit for the StackExchange Q&A format but I think it a) an answerable question and b) non-specific enough that it can benefit the community. Some of my considerations are specific to me but I think the question could be of use to anyone faced with deciding on SQL vs NoSQL vs both.
The background:
The web app we are building contains data that is clearly relational in
nature as well as data that is document-oriented. We would like to
have our cake and eat it too.
TL;DR: I think #5 below passes the smell test. Do you? Does anyone
have experience with such an integration of SQL and NOSQL in a single
application? I tried to list all the possible approaches to this class
of problem in below. Have I missed a promising alternative?
Complexities:
already call for dozens of different documents. This number will
only ever go up. The best possible case would be one in which we
could leverage a simple domain specific language, code generation
and a flexible schema so that domain experts could handle the
addition of new document classes without the intervention of DBAs or
programmers. (Note: already aware we are living out Greenspun's
Tenth Rule)
of the project. The data will be business critical. Full ACID
semantics on writes can be sacrificed provided that the things that
do get succesfully written stay written.
specific case will re
That said, there are some wrinkles (surprise, surprise). While some of
the wrinkles are fairly specific to the project, I believe there are
some general observations and questions that have come up which don't
have a canonical answer (that I could find, anyway) and that are
applicable to a wider problem domain. There's a lot here and I'm not
sure it's a good fit for the StackExchange Q&A format but I think it a) an answerable question and b) non-specific enough that it can benefit the community. Some of my considerations are specific to me but I think the question could be of use to anyone faced with deciding on SQL vs NoSQL vs both.
The background:
The web app we are building contains data that is clearly relational in
nature as well as data that is document-oriented. We would like to
have our cake and eat it too.
TL;DR: I think #5 below passes the smell test. Do you? Does anyone
have experience with such an integration of SQL and NOSQL in a single
application? I tried to list all the possible approaches to this class
of problem in below. Have I missed a promising alternative?
Complexities:
- There are many different classes of documents. The requirements
already call for dozens of different documents. This number will
only ever go up. The best possible case would be one in which we
could leverage a simple domain specific language, code generation
and a flexible schema so that domain experts could handle the
addition of new document classes without the intervention of DBAs or
programmers. (Note: already aware we are living out Greenspun's
Tenth Rule)
- The integrity of previous successful writes is a central requirement
of the project. The data will be business critical. Full ACID
semantics on writes can be sacrificed provided that the things that
do get succesfully written stay written.
- The documents are themselves complex. The prototype document in our
specific case will re
Solution
Some thoughts....
Typically one does not want to store pieces of tightly interrelated information in different systems. The chances of things getting out of sync is significant and now instead of one problem on your hands you have two. One thing you can do with Mongo though is use it to pipeline your data in or data out. My preference is to keep everything in PostgreSQL to the extent this is possible. However, I would note that doing so really requires expert knowledge of PostgreSQL programming and is not for shops unwilling to dedicate to using advanced features. I see a somewhat different set of options than you do. Since my preference is not something I see listed I will give it to you.
You can probably separate your metadata into common data, data required for classes, and document data. In this regard you would have a general catalog table with the basic common information plus one table per class. In this table you would have an hstore, json, or xml field which would store the rest of the data along with columns where you are storing data that must be constrained significantly. This would reduce what you need to put in these tables per class, but would allow you to leverage constraints however you like. The three options have different issues and are worth considering separately:
hstore is relatively limited but also used by a lot of people. It isn't extremely new but it only is a key/value store, and is incapable of nested data structures, unlike json and xml.
json is quite new and doesn't really do a lot right now. This doesn't mean you can't do a lot with it, but you aren't going to do a lot out of the box. If you do you can expect to do a significant amount of programming, probably in plv8js or, if you want to stick with older environments, plperlu or plpython.
xml is the best supported of the three, with the most features, and the longest support history. Then again, it is XML.....
However if you do decide to go with Mongo and PostgreSQL together, note that PostgreSQL supports 2 phase commit meaning you can run the write operations, then issue
Typically one does not want to store pieces of tightly interrelated information in different systems. The chances of things getting out of sync is significant and now instead of one problem on your hands you have two. One thing you can do with Mongo though is use it to pipeline your data in or data out. My preference is to keep everything in PostgreSQL to the extent this is possible. However, I would note that doing so really requires expert knowledge of PostgreSQL programming and is not for shops unwilling to dedicate to using advanced features. I see a somewhat different set of options than you do. Since my preference is not something I see listed I will give it to you.
You can probably separate your metadata into common data, data required for classes, and document data. In this regard you would have a general catalog table with the basic common information plus one table per class. In this table you would have an hstore, json, or xml field which would store the rest of the data along with columns where you are storing data that must be constrained significantly. This would reduce what you need to put in these tables per class, but would allow you to leverage constraints however you like. The three options have different issues and are worth considering separately:
hstore is relatively limited but also used by a lot of people. It isn't extremely new but it only is a key/value store, and is incapable of nested data structures, unlike json and xml.
json is quite new and doesn't really do a lot right now. This doesn't mean you can't do a lot with it, but you aren't going to do a lot out of the box. If you do you can expect to do a significant amount of programming, probably in plv8js or, if you want to stick with older environments, plperlu or plpython.
json is better supported in 9.3 though at least in current development snapshots, so when that version is released things will get better.xml is the best supported of the three, with the most features, and the longest support history. Then again, it is XML.....
However if you do decide to go with Mongo and PostgreSQL together, note that PostgreSQL supports 2 phase commit meaning you can run the write operations, then issue
PREPARE TRANSACTION and if this succeeds do your atomic writes in Mongo. If that succeeds you can then COMMIT in PostgreSQL.Context
StackExchange Database Administrators Q#38714, answer score: 14
Revisions (0)
No revisions yet.