patternsqlModerate
Change Data Types large Live table
Viewed 0 times
livelargetypesdatachangetable
Problem
I have a table in SQL Server 2008 R2 with close to a billion rows with Composit PK Column. I want to change the Datatype of Columns from Date to Datetime. Two times ALTER TABLE zzz ALTER COLUMN yyy works, but it's very slow. How can I speed the process up? I was thinking to copy the data to another table, drop, create, copy back and switching to simple recovery mode or somehow doing it with a cursor a 1000 rows a time but I'm not sure if those will actually lead to any improvement.
Solution
Summary
For large tables, it is nearly always preferable to move the data instead of altering the table. To get maximum speed, use one of the following patterns. Let us call the table you want to change
Copy Data
In step 6 above, you will need to move data from
Of these two, SSIS is the fastest. With
Option 1: INSERT ... SELECT
If
This is minimally logged.
If
This is also be minimally logged. The problem with this option is that you can only run one
Option 2: Use SSIS
This option is a lot more trouble, but also faster. The idea is this:
This is MUCH faster because the packages can run in parallel. With this technique, I can typically move tables at GB/sec.
For large tables, it is nearly always preferable to move the data instead of altering the table. To get maximum speed, use one of the following patterns. Let us call the table you want to change
S (for source).- Create an empty copy of
S. Let us call thatT
ALTER TABLEonT. This is very fast becauseThas no rows
- If
Thas a cluster index, remove all other indexes but the cluster
- If
Tis a heap, remove all indexes
- Put the database in SIMPLE recovery mode (if you can, but be aware of the backup implications)
- Use one of the two options provided in the Copy Data section below
- Rebuild the indexes on
T
- Drop
Sand rename toT
Copy Data
In step 6 above, you will need to move data from
S to T at highest possible speed. The following are two options:- Use
INSERT ... SELECT
- Use SSIS
Of these two, SSIS is the fastest. With
INSERT ... SELECT you are restricted to a single thread. However, INSERT ... SELECT is much easier. See below for more details.Option 1: INSERT ... SELECT
If
T is a heap, do this:INSERT INTO T WITH (TABLOCK) SELECT ... FROM SThis is minimally logged.
If
T has a clustered index, do this:DBCC TRACEON (610)
INSERT INTO T SELECT ... FROM SThis is also be minimally logged. The problem with this option is that you can only run one
INSERT statement in parallel. Because SQL Server does not do parallel INSERT ... SELECT, this restricts you to around 40-80MB/sec - which is quite slow.Option 2: Use SSIS
This option is a lot more trouble, but also faster. The idea is this:
- Drop all indexes on
T
- Create an SSIS package that takes a parameter that selects a subset of the rows in
S(using some filter that is supported by an index). The SSIS package then moves the rows toTusing theTABLOCKhint in bulk mode
- Execute multiple copies of the SSIS package, each operating on their own, distinct subset of
S(so you don't duplicate rows inT)
This is MUCH faster because the packages can run in parallel. With this technique, I can typically move tables at GB/sec.
Code Snippets
INSERT INTO T WITH (TABLOCK) SELECT ... FROM SDBCC TRACEON (610)
INSERT INTO T SELECT ... FROM SContext
StackExchange Database Administrators Q#82134, answer score: 12
Revisions (0)
No revisions yet.