HiveBrain v1.2.0
Get Started
← Back to all entries
patternCritical

Should binary files be stored in the database?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
storedthedatabasefilesbinaryshould

Problem

What is the best place for storing binary files that are related to data in your database? Should you:

  • Store in the database with a blob



  • Store on the filesystem with a link in the database



  • Store in the filesystem but rename to a hash of the contents and store the hash on the database



  • Something I've not thought of



The advantages of (1) are (among others) that atomicity of transactions is preserved. The cost is that you might dramatically increase storage (and associated streaming/backup) requirements

The goal of (3) is to preserve atomicity to some degree - if you can enforce that the filesystem you are writing to does not allow files to be changed or deleted, and always has the correct hash as filename. The idea would be to write the file to the filesystem before permitting the insert/update referencing the hash - if this transaction fails after the filesystem write but before the database DML, that is fine because the filesystem is 'faking' being the repository of all possible files and hashes - it doesn't matter if there are some files in there that are not being pointed to (and you could clean them up periodically if you are careful)

EDIT:

It looks like some RDBMSs have this covered in their individual ways - I'd be interested to know how others do it - and particularly in a solution for postgres

Solution

-
Store in the database with a blob

  • A disadvantage is that it makes your database files quite large and possibly too large to back up with your existing set up.



  • An advantage is integrity and atomicity.



-
Store on the filesystem with a link in the database

-
I've come across such horrible disasters doing this, and it scares me that people keep suggesting it. Some of the disasters included:

  • One privileged user who would rearrange the files and frequently break the links between the paths in the DB and where they now are (but somehow this became my fault).



  • When moving from one server to another, the ownership of some of the files was lost as the SID for the old machine's administator account (what the old website was running on) was not part of the domain and so the copied files had ACLs that could not be resolved thus presenting users with the username/password/domain login prompt.



  • Some of the paths ended up being longer than 256 characters from the C:\ all the way to the .doc and not all versions of NT were able to deal with long paths.



-
Store in the filesystem but rename to a hash of the contents and store the hash on the database

  • The last place I worked at did this based on my explanation of the above scenarios. They thought it was a compromise between the organization's inability to obtain experience with large databases (anything larger than about 40G was ordained to be "too big"), the corporate inability to purchase large hard drives, and the inability to purchase a more modern back up solution, and the need to get away from risks #1 & #3 that I identified above.



My opinion is that storing in the DB as a blob is a better solution and more scalable in a multi-server scenario, especially with failover and availability concerns.

Context

StackExchange Database Administrators Q#2445, answer score: 78

Revisions (0)

No revisions yet.