HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Datawarehouse indexing

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
indexingdatawarehousestackoverflow

Problem

I have a table with nearly 6m records. The business key that identifies a unique row is quite large. Our update processing is now taking a lot longer since I added this new table for a new cube. I currently do not have an index on the join columns for the update. SQL Server estimated execution plan says I should create this index on the business key:

```
/*
Missing Index Details from Server.db
The Query Processor estimates that implementing the following index
could improve the query cost by 86.9178%.
*/

/*
USE [db]
GO
CREATE NONCLUSTERED INDEX []
ON [schema].[Production]
(
[ProdId],[PriceCalc],[CalcTypeId],[OprId],[CostGroupId],[Resource],
[BOM],[ResourceDepartment],[OprNum],[DateWIP],[DataAreaId],[Partition]
)
INCLUDE
(
[ProdOrderStatus],[ManufacturedItemId],[CalcType],[CalculationLevel],
[CostAnalysisOrderType],[CostGrouping],[UnitId],[WorkCenter],[Name],
[RealisedConsumption],[RealisedCostAmount],[RealisedCostAdjustment],
[EstimatedConsumption],[EstimatedCostAmount],[LotSizeVariance],
[StandardQty],[StandardCost],[ItemStandardQty],[HasSubstitutionVariance],
[R2],[R3],[StandardQtyByRAFQty],[StandardCostByRAFQty],[ProductionOrderType],
[RealisedAllocation],[CostVariance],[QuantityVariance],[SubstitutionVariance],
[TotalVariance],[ComponentItemId],[InventoryUOM],[InventConsumptionTransUOM],
[BomConsumptionTranUOM],[TransactionUOM],[TransUOMToInvUOMConversionRate],
[InventConsumptionInvUOM],[BomConsumptionInvUOM],[TotalNetWeightPerUnitInvUOM],
[InventoryNetWeightUOM],[ReportingNetWeightUOM],
[NetWeight_InvUOMToReportingUOMConversionRate],
[InventConsumptionTotalNetWeightInvUOM],[BomConsumptionTotalNetWeightInvUOM],
[InventConsumptionTotalNetWeightReportingUOM],
[BomConsumptionTotalNetWeightReportingUOM],
[FinancialProductId],[FinancialDepartmentId],[FinancialMarketId],[FinancialCodeId],
[FinancialTypeId],[FinancialSiteId],[ProdPoolId],[Company_SK],[ComponentItem_SK],
[EndedDate_S

Solution

I'm working on the assumption that the table in question is a fact table, not a dimension table with a huge composite key:

Just to fix the performance issue in the short term, I would add all of these key columns as the table's clustered index, which means you won't have to INCLUDE a lot of measures and stuff, like the suggested index does. Also, make the index unique if the data allows for this.

As for the column order of a clustered index on a fact table, it depends on how you're accessing them. If you're only using a cube to read large chunks of data, I would probably prioritize INSERT priority by making the index chronological, i.e. putting the date column first - that way, new rows get added to the end of the index (in the best of worlds).

If you're running user T-SQL queries on the fact table, I would try to arrange the index columns in an order that gives you Index Seeks or Range Scans as much as possible: first, columns that are filtered on single dimension keys (think "year", "type", "unit" or "department"-type dimensions), then those columns that are filtered on multiple dimension members, ranges, or used for sorting.

There are, of course, other schools on how to build indexes - this is not a "single correct answer".

Edit: More on clustered vs non-clustered indexes:

I'm guessing that you already have an existing clustered index, and that's why SQL Server suggests a non-clustered index. However, non-clustered indexes have to be explicitly defined with INCLUDE columns. Clustered indexes define the actual storage/sort order of the table, and as such, they will implicitly include all columns in the table (I won't go into LOB columns like varchar(max) and xml).

The clustered index is normally the "catch-all index" that takes care of queries that are not suitable for an existing non-clustered index, which makes it even the more important (in my opinion) that it's well-designed and not, for instance, just on an IDENTITY() column.

Plus, a non-clustered index will take up more drive space, so a non-clustered index that covers all of the table's columns will in effect take up as much space as the table itself. A clustered index is the table.

Context

StackExchange Database Administrators Q#127847, answer score: 8

Revisions (0)

No revisions yet.