patternModerate
Why can't relational databases meet the scales of Big Data?
Viewed 0 times
whycanthedatabasesrelationalbigmeetdatascales
Problem
It is often repeated that the Big Data problem is that relational databases can not scale to process the massive volumes of data that are now being created.
But what are these scalability limitations that Big Data solutions like Hadoop are not bound to? Why can't Oracle RAC or MySQL sharding or MPP RDBMS like Teradata (etc.) achieve these feats?
I am interested in the technical limitations - I am aware that the financial costs of clustering RDBMS can be prohibitive.
But what are these scalability limitations that Big Data solutions like Hadoop are not bound to? Why can't Oracle RAC or MySQL sharding or MPP RDBMS like Teradata (etc.) achieve these feats?
I am interested in the technical limitations - I am aware that the financial costs of clustering RDBMS can be prohibitive.
Solution
MS just had a tech talk in the Netherlands where they discussed some of this stuff. It starts off slowly, but gets into the meat of Hadoop around the 20 minute mark.
The gist of it is that "it depends". If you have a sensibly arranged, (at least somewhat) easy to partition set of data that (at least somewhat) is homogeneous, it should be fairly easy to scale to those high data volumes with an RDBMS, depending upon what you're doing.
Hadoop and MR seem to be more geared to situations where you are forced to to large distributed scans of data, especially when those data aren't necessarily as homogeneous or as structured as what we find in the RDBMS world.
What limitations are Big Data solutions not bound to? To me, the biggest limitation they're not bound to is having to make a rigid schema ahead of time. With Big Data solutions, you shove massive amounts of data into the "box" now, and add logic to your queries later to deal with the lack of homogeneity of the data. From a developer's perspective the tradeoff is ease of implementation and flexibility on the front end of the project, versus complexity
in querying and less immediate data consistency.
The gist of it is that "it depends". If you have a sensibly arranged, (at least somewhat) easy to partition set of data that (at least somewhat) is homogeneous, it should be fairly easy to scale to those high data volumes with an RDBMS, depending upon what you're doing.
Hadoop and MR seem to be more geared to situations where you are forced to to large distributed scans of data, especially when those data aren't necessarily as homogeneous or as structured as what we find in the RDBMS world.
What limitations are Big Data solutions not bound to? To me, the biggest limitation they're not bound to is having to make a rigid schema ahead of time. With Big Data solutions, you shove massive amounts of data into the "box" now, and add logic to your queries later to deal with the lack of homogeneity of the data. From a developer's perspective the tradeoff is ease of implementation and flexibility on the front end of the project, versus complexity
in querying and less immediate data consistency.
Context
StackExchange Database Administrators Q#13931, answer score: 15
Revisions (0)
No revisions yet.