patternMinor
Does the 500-table limit still apply to the latest version of Cassandra?
Viewed 0 times
thelatestversionlimitapply500doescassandrastilltable
Problem
I've read about limits on Cassandra and I figured out that Cassandra can't have more than 500 tables.
Do you know if this is still true and whether the newest version (5) solves it?
Do you know if this is still true and whether the newest version (5) solves it?
Solution
There is no hard limit when it comes to the number of tables. In fact, it is possible to have thousands of tables defined in a single Cassandra cluster.
However, our general recommendation is to stick to "low hundreds" or as close to 200 tables where possible.
One of the reasons for this recommendation is that each table takes up about 1MB of memory to hold metadata. If you had a cluster with 500 tables, 500MB of memory on each node will be used exclusively for holding table metadata alone. This can affect the optimal performance of the cluster, particularly as you create more tables.
Additionally, each application instance needs to refresh its version of the schema every time a schema change takes place so lots of tables will have a significant hit on your applications' performance. In simple terms, retrieving the schema definition of 200 tables is going to be faster than getting the schema of 500 or 1,000 tables. If you have hundreds of tables, there's a risk that some of the app instances would even timeout while performing a schema refresh.
Some proposed enhancements (aka Cassandra Enhancement Proposals or CEPs) will partially address this namely the Transactional Cluster Metadata feature (CEP-21, CASSANDRA-18330) but I don't think it will completely solve it. Cheers!
However, our general recommendation is to stick to "low hundreds" or as close to 200 tables where possible.
One of the reasons for this recommendation is that each table takes up about 1MB of memory to hold metadata. If you had a cluster with 500 tables, 500MB of memory on each node will be used exclusively for holding table metadata alone. This can affect the optimal performance of the cluster, particularly as you create more tables.
Additionally, each application instance needs to refresh its version of the schema every time a schema change takes place so lots of tables will have a significant hit on your applications' performance. In simple terms, retrieving the schema definition of 200 tables is going to be faster than getting the schema of 500 or 1,000 tables. If you have hundreds of tables, there's a risk that some of the app instances would even timeout while performing a schema refresh.
Some proposed enhancements (aka Cassandra Enhancement Proposals or CEPs) will partially address this namely the Transactional Cluster Metadata feature (CEP-21, CASSANDRA-18330) but I don't think it will completely solve it. Cheers!
Context
StackExchange Database Administrators Q#326616, answer score: 4
Revisions (0)
No revisions yet.