HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How do I move or redistribute FILESTREAM files between containers?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
filestreammovefilesbetweenhowcontainersredistribute

Problem

We have a FILESTREAM container with a few million files, and we believe it's the cause of our performance issues we're experiencing (abundant timeouts).

According to this blog about FILESTREAM best practices, each container shouldn't reach more than 300,000 files.

According to the accepted answer here, there is no way to accomplish it except by recreating the table to a new FILESTREAM location.

  • Is this true for my situation?



  • And if so, what is the recommended approach to ensuring that enough containers are created automatically to handle this without manual intervention every 300,000 files?



The structure of the table follows:

CREATE TABLE [dbo].[Documents](
    [ContentPath] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
    [FileContent] [varbinary](max) FILESTREAM  NULL,
 CONSTRAINT [UQ_IX_Documents_ContentPath] UNIQUE NONCLUSTERED 
(
    [ContentPath] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY] FILESTREAM_ON [FSFileGroup_1]


We're using SQL Server 2012. Not Enterprise unfortunately (which we only recently realized supports multiple containers per filegroup).

Although we never do record updates, we do a lot of writes, probably as many as we do reads. The pattern is: just one at a time, reads by ContentPath, and writes in no specific container or order.

Solution

A good option for you, I would imagine, is to split table logically, so every section would be in its own folder and have under 300k files(=records).

In SQL Server logical table split is usually done with Partitioning (this is an Enterprise feature). Partitioning essentially maps sections of table (partitions) to physical storage (filegroups). Various partitions can be assigned to different filegroups. FILESTREAM is essentially a property of a filegroup.

Design

This makes up the spine of your solution:

  • Partitions reside on different filegroups



  • Each filegroup is a separate FILESTREEAM folder/container



  • Size of each partition under 300k records



Here is how to partition a table with FILESTREAM column.

Main advantage: you will be able to manage your FILESTREAM data per Partition at a time, not whole table.

You'll need:

-
ID that would allow to track size of a partition or at least not allow random writes across partitions

[Id] [uniqueidentifier] ROWGUIDCOL NOT NULL DEFAULT (newsequentialid())


-
Maintenance job, that checks how full last partition is, creating new filegroups and partitions in advance, rebuilding indexes per partition etc.

-
To cater for Insert operations make you partitions smaller than 300k, so index is rebuild one partition at a time.

This design is for mostly read-only table with inserts happening at the end of it. Do consider writes and updates and impact of partitioning on them, if you have not done so already.

Migration

In order to make a table partitioned, you need to build an index. There 2 main ways:

-
Rebuild index for the whole table. It will requires x2-x3 times more space than the table. It is not suitable for big tables for a number of reasons.

-
Migrate data into new table, which is already partitioned.

My preferred method for this:

  • Create a table of the same structure as target partitioned table.



  • Insert data using DELETE... OUTPUT DELETED ... INTO for data that fits 1 partition.



  • If number of rows is too big for one transaction, then wrap this into a loop



  • Create index on Partitioned scheme



  • Switch partition to Target table



  • Drop index on Intermediary table



  • Wrap in a loop and repeat for all partitions/data



Inserts

Newly inserted rows ideally should go into last partition. However, there is a catch about NEWSEQUENTIOALID() function:


Creates a GUID that is greater than any GUID previously generated by
this function on a specified computer since Windows was started. After
restarting Windows, the GUID can start again from a lower range, but
is still globally unique.

It means that new writes will go into last partition only until first reboot of the server. And this will eventually happen. New rows will be inserted in the middle of the table. But remember we have partitions now? Only one partition at a time will be affected. Still it will exceed 300k rows at some point in time.

Main option here is to split partition. Existing partition will split in 2, where right part will form a new partition. New partition should go onto new filegroup. It will not be a metadata only operation: files will be physically copied into another partition.

A work around would be moving (delete and insert rows into Intermediary table) rows on same PS; splitting partition and moving rows back. But it involves moving data once anyway. If intermediary table in on same PS, then split will involve moving data. If not, then data needs to be moved second time back into main table.

Multiple FILESTREAM containers

Multiple FILESTRAM containers are a feature of Enterprise version. A container is often referred to as a "folder" in FILESTREAM container. Actually it is a database file in a filegroup of FILESTREAM type.

It does help to stay under 300k files per NTFS folder, but it is very difficult to manage stored files across these folders explicitly.

Article Rebalancing data across files in a filegroup from 2011 by Paul Randall gives a hint on use of files in a filegroup:


[SQL Server] also uses an algorithm called proportional fill that
aims to allocate data from the files according to how much free space
they have relative to the other files in the filegroup.


...


This means that if you add a new data file to a filegroup that has mostly full
data files, the proportional fill weightings will be such that the new
file will be the one where allocations come from until it fills up to
the same level as the older files. The new file in essence becomes an
allocation hot spot.

SQL Server will be filling newly added files first until they fill up to the same %age level as others.

It may look like an easier alternative to Partitioning, however it is not:

  • You cannot do anything about files already stored in existing containers.



  • You still have one big table that you need to manage within single transaction



At the same time you can blend multiple FILESTREAM containers into Partitioning solution to

Code Snippets

[Id] [uniqueidentifier] ROWGUIDCOL NOT NULL DEFAULT (newsequentialid())

Context

StackExchange Database Administrators Q#138756, answer score: 5

Revisions (0)

No revisions yet.