snippetsqlModerate

UPDATE with JOIN on 100mm records, how to do this better? (in T-SQL)

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

thisupdatewithsqlrecordsjoinbetterhow100mm

Problem

I need to update 100 million records in a single table, in effect, normalizing the table by replacing the varchar value of a column with simply an ID. (I say "replacing" but really I'm writing the ID into another column.)

What I'm trying to achieve is to normalize the dataset. The not yet normalized data has no indexing. My thought was that I would not build indexes on the raw values, waiting, instead to index the foreign keys that will be replacing the varchar values with tinyint values after the update completes.

UPDATE A
SET A.AutoClassID = B.AutoClassID
FROM AutoDataImportStaging.dbo.Automobile as A
JOIN AutoData.dbo.AutoClass as B on (A.AutoClassName = B.AutoClassName)

Background

using MSSQL 2008 R2 on Server 2008 R2

server has 8 GB RAM

server has one RAID10, 7200 RPM SATA (not great, I know, in production this will only read data and not write data; plus recent HD shortage made this necessary for cost)

server has dual quad-core Xeon CPU

the machine is not doing anything else (currently dedicated to dev, only this process)

simple logging turned on (? - but does it still log so that it can rollback?)

note that the query references two different DBs, for what that's worth

"width" of a record in table getting updated is 455 bytes

Resources During Execution

physical RAM is maxed out

disk I/O is maxed out

CPU is hardly doing anything (choke point is I/O)

run time has been 14 hours and counting!

I suspect a few things like I need an index on the raw data, even though I will be dropping the column (AutoClassName) after the normalization updates. I also wonder if I should just loop down the table one record at a time instead of the JOIN, which seemed ridiculous at the time I started this, but now it seems that would have been faster.

How should I change my methodology for my remaining normalization updates (similar to this one) more quickly?

Solution

I would take a different approach.

Instead of updating existing tables, just build a new table that has what you need in it.

This will almost certainly be faster:

SELECT DISTINCT
    AutoClassID,
    
INTO
    AutoDataImportStaging.dbo.Automobile
FROM
    AutoData.dbo.AutoClass

As currently written, there are a lot of logical operations happening:

Read all values of A.AutoClassName

Read all values of B.AutoClassName

Compare A and B values

Of the matching set, read all values of B.AutoClassID

Update existing values of A.AutoClassId to be the B.AutoClassId value through whatever indexes exist

Code Snippets

SELECT DISTINCT
    AutoClassID,
    <Other fields>
INTO
    AutoDataImportStaging.dbo.Automobile
FROM
    AutoData.dbo.AutoClass

Context

StackExchange Database Administrators Q#20499, answer score: 10

Revisions (0)

No revisions yet.