HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

SQL to select random mix of rows fairly

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
randomrowsmixsqlfairlyselect

Problem

Here's my problem: I have a set of tables in a database populated with data from a client that contains product information. In addition to the basic product information, there is also information about the manufacturer, and categories for those products (a product can be in one or more categories). These categories are then referred to as "Product Categories", and which stores these products are available at. These tables are updated once a week from a feed from the customer.

Since for our purposes, some of the product categories are the same, or closely related for our purposes, there is another level of categories called "General Categories", a general category can have one or more product categories.

For the scope of these tables, here's some rough numbers:

Data Tables:
Products:           475,000
Manufacturers:      1300
Stores:             150
General Categories: 245
Product Categories: 500

Mapping Tables:
Product Category -> Product: 655,000
Stores -> Products:          50,000,000


Now, for the actual problem: As part of our software, we need to select n random products, given a store and a general category. However, we also need to ensure a good mix of manufacturers, as in some categories, a single manufacturer dominates the results, and selecting rows at random causes the results to strongly favor that manufacturer.

The solution that is currently in place, works for most cases, involves selecting all of the rows that match the store and category criteria, partition them on manufacturer, and include their row number from within their partition, then select from that where the row number for that manufacturer is less than n, and use ROWCOUNT to clamp the total rows returned to n.

This query looks something like this:

```
SET ROWCOUNT 6
select p.Id, GeneralCategory_Id, Product_Id, ISNULL(m.DisplayName, m.Name) AS Vendor,
MSRP, MemberPrice, FamilyImageName
from (select p.Id, gc.Id GeneralCategory_Id,
p.Id Product

Solution

A clustered index seek or scan could be improved to a non-clustered index seek or scan which should be more efficient.

Since it looks like your problem is Products, I would see about adding an index which would be covering on that table (or perhaps an indexed view since you already have:

Id
ManufacturerId
Active
MemberPrice

Because some of your other columns don't have prefixes, I can't tell where they come from, but I expect some of them also come from Products, so this might not be feasible to make this index covering.

However, but having Active and MemberPrice in the non-clustered index, this might help. It might be enough to tip the plan in favor of a NCI with a lookup to the clustered index to get the remaining columns (like FamilyImageName)

Context

StackExchange Database Administrators Q#19612, answer score: 2

Revisions (0)

No revisions yet.