HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Why is it recommended to store BLOBs in separate SQL Server tables?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whytablesblobsrecommendedsqlseparatestoreserver

Problem

This highly-upvoted SO answer recommends to put images in separate tables, even if there is only a 1:1 relationship with another table:


If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee photo in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee photo, too, as part of your queries.

Why? I was under the impression that SQL Server only stores a pointer to some dedicated BLOB data structure in the table, so why bother to manually create another layer of indirection? Does it really improve performance significantly? If yes, why?

Solution

While I disagree that BLOBs should just be in another table -- they should not be in the database at all. Store a pointer to where the file lives on disk, and then just get that from the database...

The primary issue they cause (for me) is with indexing. Using XML with query plans, because everyone's got'em, let's make a table:

SELECT TOP 1000
ID = IDENTITY(INT,1,1),
deq.query_plan
INTO dbo.index_test
FROM sys.dm_exec_cached_plans AS dec
CROSS APPLY sys.dm_exec_query_plan(dec.plan_handle) AS deq

ALTER TABLE dbo.index_test ADD CONSTRAINT pk_id PRIMARY KEY CLUSTERED (ID)


It's only 1000 rows, but checking on the size...

sp_BlitzIndex @DatabaseName = 'StackOverflow', @SchemaName = 'dbo', @TableName = 'index_test'


It's over 40 MB for just 1000 rows. Assuming you add 40 MB every 1000 rows, that can get pretty ugly pretty quickly. What happens when you hit 1 million rows? That's just about 1 TB of data, there.

Any queries that need to use your clustered index now need to read all of that BLOB data into memory when the BLOB data column is referenced.

Can you think of better ways to use SQL Server memory than storing BLOBs? Because I sure can.

Expanding it to nonclustered indexes:

CREATE INDEX ix_noblob ON dbo.index_test (ID)

CREATE INDEX ix_returnoftheblob ON dbo.index_test (ID) INCLUDE (query_plan)


You can design your nonclustered indexes to largely avoid the BLOB column so regular queries can avoid the clustered index, but as soon as you need that BLOB column, you need the clustered index.

If you add it as an INCLUDED column to a nonclustered index to avoid a key lookup scenario, you end up with gigantic nonclustered indexes:

More problems they cause:

  • If anyone runs a SELECT * query, they get all that BLOB data.



  • They take up space in backups and restores, slowing them down



  • They slow down DBCC CHECKDB, because I know you're checking for corruption, right?



  • And if you do any index maintenance, they slow that down as well.



Hope this helps!

Code Snippets

SELECT TOP 1000
ID = IDENTITY(INT,1,1),
deq.query_plan
INTO dbo.index_test
FROM sys.dm_exec_cached_plans AS dec
CROSS APPLY sys.dm_exec_query_plan(dec.plan_handle) AS deq

ALTER TABLE dbo.index_test ADD CONSTRAINT pk_id PRIMARY KEY CLUSTERED (ID)
sp_BlitzIndex @DatabaseName = 'StackOverflow', @SchemaName = 'dbo', @TableName = 'index_test'
CREATE INDEX ix_noblob ON dbo.index_test (ID)

CREATE INDEX ix_returnoftheblob ON dbo.index_test (ID) INCLUDE (query_plan)

Context

StackExchange Database Administrators Q#174678, answer score: 18

Revisions (0)

No revisions yet.