HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Is SQL Server able to use internal parallelism for an update statement?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
updateserversqlstatementforinternalparallelismuseable

Problem

I am struggling to find the best way to migrate some "varchar" columns to "nvarchar". One of the options i am using is to add new nvarchar column(s) then update the values from the original column, drop the original column and rename the new one to the old name.

I know it will generate a lot of UNDO and REDO data. Still, I have other limitations (mostly by SQL Server not supporting parallel DDLs and multi-column ALTER table operations), so let's focus on how to run the update statement faster.

My Oracle experience is telling me to use internal parallelism, but is it available in SQL Server?

I am not able to run this statement in parallel, although I especially created the table to be a heap table (no clustered index).

update t
set new_col_1 = col_1
   ,new_col_2 = col_2
   ...
   , new_col_N = col_N
;


There are 3 text columns holding 400GB of data. There is limited IO performance from AWS RDS (10000 IOPS). We have just 4 hours of downtime window.

In this particular migration the online rebuild is not an option as the data must be migrated (to nvarchar) before the application can be started. During startup it is checking whether the actual data types correspond to the defined ones (in the application metadata repository).

I am aware of the fragmentation, but we just have no choice. Still, if there is some ONLINE rebuild command it might be useful as we will be able to migrate and de-fragment later. Actually, as one of the preparations steps, we are dropping the clustered index. Later this index will be created again, which i believe will fix the fragmentation issue, as we will move from a heap to b*tree structure.

It is very frustrating that we can not use any other "parallel" technique. I am thinking to try manual parallel update, by running a few parallel update statements against not overlapping ranges of the target table. Still, the lock escalation, could be the next issue, as i am going to update millions of records in each of those updates, and

Solution

No, SQL Server does not support parallel update*. That said, it is unclear if parallelism would assist an I/O-bound operation in your scenario.

SQL Server does support parallel insert to a row store heap or clustered columnstore (with restrictions), select into a new heap, and some parallel DDL operations like index building.

Depending on your version of SQL Server, the size and schema of the table in question, any relationships to other tables, and recovery model, you may find it faster to create a new table with the updated column data type using parallelism and minimal logging. You should test this option even with a limited I/O cloud subsystem, rather than assuming it would be too slow.

If you were on SQL Server 2016 or later, an ONLINE alter column might also have been an option. This is limited to one column change at a time but is non-blocking. It also tends to result in a better-organized final result (fewer or no heap forwarded records).

If you are able to split the change across multiple maintenance window and have sufficient storage available, you could migrate gradually from one schema to another using a batch process and triggers to copy changes across during non-maintenance periods. The final step is to drop the old table and rename the new as described here.

Some of the methods available to you might generate a lot of transaction log (as you mentioned). If you were able to upgrade to 2019 or later, Accelerated Database Recovery would allow aggressive log truncation (including for in-progress transactions), likely making this aspect a non-issue.

You should carefully consider your proposed method since the heap will still contain the dropped column until it is fully rebuilt, and the heap will have very many forwarded records as existing rows do not have enough space to hold the extra column data.

Do not use a heap if the data is frequently updated. If you update a record and the update uses more space in the data pages than they are currently using, the record has to be moved to a data page that has enough free space. This creates a forwarded record pointing to the new location of the data, and forwarding pointer has to be written in the page that held that data previously, to indicate the new physical location. This introduces fragmentation in the heap. When scanning a heap, these pointers must be followed which limits read-ahead performance, and can incur additional I/O which reduces scan performance.

* The update itself is never performed by more than one thread. Other operators in an update execution plan may use parallelism. SQL Server also supports multiple concurrent updates to the same table from different connections of course, but with your table structured as a heap you likely have no way to ensure these updates could be efficiently disjoint.

Context

StackExchange Database Administrators Q#323266, answer score: 10

Revisions (0)

No revisions yet.