patternsqlMinor
Updating a table with more than 850 million rows of data
Viewed 0 times
millionrowswiththanupdatingmore850datatable
Problem
I have been tasked with writing an update query to update a table with more than 850 million rows of data. Here are the table structures:
Source Tables :
Here is the
```
CREATE TABLE [dbo].TargetTable NOT NULL,
[VariantType] [tinyint] NULL,
[Visibility] [tinyint] NULL,
CONSTRAINT [PK_TargetTable] PRIMARY KEY CLUSTERED
(
[ChassisSpecificationId] ASC,
[VariantId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 71) ON [PRIMARY]
) ON [P
Source Tables :
CREATE TABLE [dbo].[SourceTable1](
[ProdClassID] [varchar](10) NOT NULL,
[PriceListDate] [varchar](8) NOT NULL,
[PriceListVersion] [smallint] NOT NULL,
[MarketID] [varchar](10) NOT NULL,
[ModelID] [varchar](20) NOT NULL,
[VariantId] [varchar](20) NOT NULL,
[VariantType] [tinyint] NULL,
[Visibility] [tinyint] NULL,
CONSTRAINT [PK_SourceTable1] PRIMARY KEY CLUSTERED
(
[VariantId] ASC,
[ModelID] ASC,
[MarketID] ASC,
[ProdClassID] ASC,
[PriceListDate] ASC,
[PriceListVersion] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90)
)
CREATE TABLE [dbo].[SourceTable2](
[Id] [uniqueidentifier] NOT NULL,
[ProdClassID] [varchar](10) NULL,
[PriceListDate] [varchar](8) NULL,
[PriceListVersion] [smallint] NULL,
[MarketID] [varchar](10) NULL,
[ModelID] [varchar](20) NULL,
CONSTRAINT [PK_SourceTable2] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 91) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]SourceTable1 contains 52 million rows of data and SourceTable2 contains 400,000 rows of data. Here is the
TargetTable structure```
CREATE TABLE [dbo].TargetTable NOT NULL,
[VariantType] [tinyint] NULL,
[Visibility] [tinyint] NULL,
CONSTRAINT [PK_TargetTable] PRIMARY KEY CLUSTERED
(
[ChassisSpecificationId] ASC,
[VariantId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 71) ON [PRIMARY]
) ON [P
Solution
To speed things up, you could try
The query plans here should show a lot of scans because you have poor indexes for the operations you are doing.
Target table indexing appears OK
Another observation: uniqueidentifier and varchar are bad choices for clustered indexes (your PKs here): too wide, not increasing, overhead of collection comparisons at least
Edit, another observation (thanks to @Marian)
Your clustered index is wide generally. Every non-clustered index points to the clustered index, which means a huge NC index too
You could probably achieve the same result by reordering the clustered PK.
- Adding a primary key to #CSpec.RowID so you don't scan it every iteration
- Change the CTE to a temp table with suitable PK. See next point too
- Add an index on SourceTable1 to match the CTE WHERE clause: currently the PK will be scanned, meaning all SourceTable1 rows will be scanned every iteration. All 52 million rows
- SourceTable2.MarketID also does not have an index, but I wouldn't worry about this because it is scanned once only (as I understand it)
The query plans here should show a lot of scans because you have poor indexes for the operations you are doing.
Target table indexing appears OK
Another observation: uniqueidentifier and varchar are bad choices for clustered indexes (your PKs here): too wide, not increasing, overhead of collection comparisons at least
Edit, another observation (thanks to @Marian)
Your clustered index is wide generally. Every non-clustered index points to the clustered index, which means a huge NC index too
You could probably achieve the same result by reordering the clustered PK.
Context
StackExchange Database Administrators Q#40580, answer score: 5
Revisions (0)
No revisions yet.