principlesqlModerate
How database size affects performance: Theory vs reality
Viewed 0 times
theorysizeaffectsdatabaserealityperformancehow
Problem
There is a lot out there that says database size should not affect performance to any great degree. As long as indexes on tables fit in memory the database should remain performant.
However what is the reality? If the database architecture isn't the best, indexes don't fit in memory, and there is potentially a lot of redundant data are there significant gains to be made simply by deleting redundant data? I estimate that 60-80% of data in my database could be deleted.
I believe that reducing the database size and upping the RAM so that indexes can fit in memory would give a significant performance increase which would give some breathing room for a few months to rearchitect the system.
Are there also other factors such as IO, fragmentation, working dataset etc that affect performance based on database size?
However what is the reality? If the database architecture isn't the best, indexes don't fit in memory, and there is potentially a lot of redundant data are there significant gains to be made simply by deleting redundant data? I estimate that 60-80% of data in my database could be deleted.
I believe that reducing the database size and upping the RAM so that indexes can fit in memory would give a significant performance increase which would give some breathing room for a few months to rearchitect the system.
Are there also other factors such as IO, fragmentation, working dataset etc that affect performance based on database size?
Solution
It depends entirely on what you are doing with the data.
For basic insert/update/delete transactions that affect just a few rows, then the growth in data size is probably not a big consideration. The database will use in-memory indexes to access the correct page. You get more cache misses when the tables no longer fit into memory. However, the overhead might be slight -- depending on the database, database configurations, and hardware configurations.
If you are doing queries that require full-table scans, then your performance is going to grow linearly or worse with the data size. Indexes can actually make the situation worse, by randomizing page accesses, which then pretty much guarantee cache misses.
An alternative to more memory is improved disk speed -- solid state disk can provide tremendous improvement.
Just having more data is unlikely to affect performance unless the tables are used in queries. Is the data redundant within a table or across tables? Having large tables that never get used is messy, but has minimal impact on performance. It is imaginable that if you have zillions of unnecessary tables, then then compiling queries could start to take more time.
For basic insert/update/delete transactions that affect just a few rows, then the growth in data size is probably not a big consideration. The database will use in-memory indexes to access the correct page. You get more cache misses when the tables no longer fit into memory. However, the overhead might be slight -- depending on the database, database configurations, and hardware configurations.
If you are doing queries that require full-table scans, then your performance is going to grow linearly or worse with the data size. Indexes can actually make the situation worse, by randomizing page accesses, which then pretty much guarantee cache misses.
An alternative to more memory is improved disk speed -- solid state disk can provide tremendous improvement.
Just having more data is unlikely to affect performance unless the tables are used in queries. Is the data redundant within a table or across tables? Having large tables that never get used is messy, but has minimal impact on performance. It is imaginable that if you have zillions of unnecessary tables, then then compiling queries could start to take more time.
Context
StackExchange Database Administrators Q#22888, answer score: 13
Revisions (0)
No revisions yet.