HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Slow Performance Inserting Few Rows Into Huge Table

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsintoslowhugefewperformanceinsertingtable

Problem

We have a process that takes data from stores and updates a company-wide inventory table. This table has rows for every store by date and by item. At customers with many stores, this table can get very large-- on the order of 500 million rows.

This inventory update process typically gets run many times a day as the stores enter data. These runs update data from only a few stores. However, the customers can also run this to update, say, all stores in the past 30 days. In this case, the process spins up 10 threads and updates each store's inventory in a separate thread.

The customer is complaining that the process is taking a long time. I have profiled the process and found that one query that INSERTs into this table is consuming much more time than I expected. This INSERT sometimes completes in 30 seconds.

When I run an ad-hoc SQL INSERT command against this table (bounded by BEGIN TRAN and ROLLBACK), the ad-hoc SQL completes on the order of milliseconds.

The slow performing query is below. The idea is to INSERT records that aren't there and later to UPDATE them as we calculate various bits of data. A prior step in the process has identified the items that need to be updated, done some calculations, and stuffed the results into the tempdb table Update_Item_Work. This process is running in 10 separate threads, and each thread has its own GUID in Update_Item_Work.

```
INSERT INTO Inventory
(
Inv_Site_Key,
Inv_Item_Key,
Inv_Date,
Inv_BusEnt_ID,
Inv_End_WtAvg_Cost
)
SELECT DISTINCT
UpdItemWrk_Site_Key,
UpdItemWrk_Item_Key,
UpdItemWrk_Date,
UpdItemWrk_BusEnt_ID,
(CASE UpdItemWrk_Set_WtAvg_Cost WHEN 1 THEN UpdItemWrk_WtAvg_Cost ELSE 0 END)
FROM tempdb..Update_Item_Work (NOLOCK)
WHERE UpdItemWrk_GUID = @GUID
AND NOT EXISTS
-- Only insert for site/item/date combinations that don't exist
(SELECT *
FROM Inventory (NOLOCK)
WHERE Inv_Site_Key = UpdItemWrk_Site_Key
AND Inv_Item_Key = UpdItemWrk_Item_Key

Solution

Looks like your clustered index page splits are going to be painful because the clustered index holds the actual data and this will need new pages to be allocated and the data moved to these. This is likely to cause page locking and thus blocking.

Remember also that your clustered index key is 21 bytes and this will need to be stored in all of your secondary indexes as a bookmark.

Have you considered making your primary key identity column your clustered index, not only will this reduce the size of your other indexes, it will also mean that you will reduce the number of page splits in your clustered index. It's worth a try if you can stomach rebuilding your indexes.

Context

StackExchange Database Administrators Q#20183, answer score: 4

Revisions (0)

No revisions yet.