HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Updating a table with more than 850 million rows of data

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
millionrowswiththanupdatingmore850datatable

Problem

I have been tasked with writing an update query to update a table with more than 850 million rows of data. Here are the table structures:

Source Tables :

CREATE TABLE [dbo].[SourceTable1](
    [ProdClassID] [varchar](10) NOT NULL,
    [PriceListDate] [varchar](8) NOT NULL,
    [PriceListVersion] [smallint] NOT NULL,
    [MarketID] [varchar](10) NOT NULL,
    [ModelID] [varchar](20) NOT NULL,
    [VariantId] [varchar](20) NOT NULL,
    [VariantType] [tinyint] NULL,
    [Visibility] [tinyint] NULL,
 CONSTRAINT [PK_SourceTable1] PRIMARY KEY CLUSTERED 
(
    [VariantId] ASC,
    [ModelID] ASC,
    [MarketID] ASC,
    [ProdClassID] ASC,
    [PriceListDate] ASC,
    [PriceListVersion] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, 
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, 
ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90)
    )

CREATE TABLE [dbo].[SourceTable2](
    [Id] [uniqueidentifier] NOT NULL,
    [ProdClassID] [varchar](10) NULL,
    [PriceListDate] [varchar](8) NULL,
    [PriceListVersion] [smallint] NULL,
    [MarketID] [varchar](10) NULL,
    [ModelID] [varchar](20) NULL,
 CONSTRAINT [PK_SourceTable2] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, 
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, 
ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 91) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]


SourceTable1 contains 52 million rows of data and SourceTable2 contains 400,000 rows of data.

Here is the TargetTable structure

```
CREATE TABLE [dbo].TargetTable NOT NULL,
[VariantType] [tinyint] NULL,
[Visibility] [tinyint] NULL,
CONSTRAINT [PK_TargetTable] PRIMARY KEY CLUSTERED
(
[ChassisSpecificationId] ASC,
[VariantId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 71) ON [PRIMARY]
) ON [P

Solution

To speed things up, you could try

  • Adding a primary key to #CSpec.RowID so you don't scan it every iteration



  • Change the CTE to a temp table with suitable PK. See next point too



  • Add an index on SourceTable1 to match the CTE WHERE clause: currently the PK will be scanned, meaning all SourceTable1 rows will be scanned every iteration. All 52 million rows



  • SourceTable2.MarketID also does not have an index, but I wouldn't worry about this because it is scanned once only (as I understand it)



The query plans here should show a lot of scans because you have poor indexes for the operations you are doing.

Target table indexing appears OK

Another observation: uniqueidentifier and varchar are bad choices for clustered indexes (your PKs here): too wide, not increasing, overhead of collection comparisons at least

Edit, another observation (thanks to @Marian)

Your clustered index is wide generally. Every non-clustered index points to the clustered index, which means a huge NC index too

You could probably achieve the same result by reordering the clustered PK.

Context

StackExchange Database Administrators Q#40580, answer score: 5

Revisions (0)

No revisions yet.