HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlModerate

SQL Database vs Multiple Microsoft Access Databases

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
databasessqlmicrosoftdatabasemultipleaccess

Problem

The company I work for produces a lot of test data in raw text files from lots of different devices. Each device belongs to a 'process' of which there are only a few dozen. Currently we have a program which takes this data and produces/adds to an Access database (currently .mdb, however we are hoping to change to .accdb because it's newer and better?) of the appropriate process.

We then use Excel to view the data from the databases in a pivot table. The database files themselves rarely get close to 1 GB (maybe a few a year) at which point we archive the database and create a new one as this is the size limit. We never need to access the data of multiple processes at the same time, so we have never had an issue with them being in separate databases.

Someone in our IT department has suggested that we switch to an SQL database, where all of the data for all processes will be stored, and the data will just have an extra column labelling the process it belongs to.

My worry is that in accessing this data through Excel we would only ever be querying one process at a time, but the database would have to filter through all of the other processes, which would surely take longer, since it is having to query dozens of times more data?

Is there any disadvantage to sticking with our current system of multiple databases, one for each process which we only access one at a time, instead of collecting them all into a large database? And conversely are there advantages of having a single large database system like this? Am I wrong and it would be faster to use a large SQL database than multiple small Access databases?

Solution

In my opinion, Microsoft Access is an obsolete way to manage data, especially within an organization. There's a lot of limitations and lack of features compared to a modern database system, besides the data being stored in a localized file (as opposed to being centralized on a server). But people still use it to do so, which I suppose is why Microsoft still supports it as an application.

My worry is that in accessing this data through excel we would only ever be querying one process at a time, but the database would have to filter through all of the other processes, which would surely take longer, since it is having to query dozens of times more data?

When architected and indexed properly, this would not be something to be concerned about. A B-Tree index (the standard type of index in Microsoft SQL Server) has a search time of O(Log2(n)). This means if your table had 1 billion rows, it would only need to search through roughly 30 nodes of the B-Tree (at most) to find any particular subset of data (Log2(1 billion) = 30).

So as the person in your IT department mentioned, if you indexed your table on the Process column (since it sounds like you only look at one process of data at a time), ideally it would only take milliseconds to locate any subset of process data, no matter how big the table was. Then loading that data off disk would take roughly the same time as it currently does in Microsoft Access, since it would be the same amount of data.

To hammer the point home, anecdotally, I used to work with databases that had tables that were terabytes big, with 10s of billions of rows in them, on a server that was no more provisioned than a modern laptop, and querying for any subset of data took less than a second for SQL Server to locate.

Context

StackExchange Database Administrators Q#314403, answer score: 12

Revisions (0)

No revisions yet.