HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

How to model when you could have many (thousands of) attributes to a table (product)?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
howyoutablecouldproductattributesmanywhenthousandsmodel

Problem

We have a main table, let's say, Product. It has several general attributes like 'Product Name', Manufacturer ID, etc. As of now, all these attributes are implemented in the data model as columns to the Product.

Now we have a new requirement where there could be thousands of attributes. Color, size, weight, dimensions, style, etc. Each of these attributes have pre-defined values. Like color can only be ('green', 'yellow', 'red'). The value set is defined for each attribute. But there could be thousands of attributes and each attribute could have thousands of pre-defined values.

So, for this new feature, we defined these attributes as 'row values' instead of 'columns'.

Something like this:

Product Table

ProductID Name Date_Created
1 Candy some date
2 Shirt some date
3 Pants some date

Attribute Table

AttributeID Name
1 Color
2 Dimension
3 Weight

AttributeValue Table

AttributeValueID AttributeID Value
1 1 Green
2 1 Yellow
3 1 Red
4 2 11x11
5 2 12x12
6 2 13x13
7 3 1 lb
8 3 2 lb
9 3 3 lb

Now to create association between Product Table and Attribute table, we have an association table called 'ProductAttributes'

ProductAttributes Table

ProductAttributeId ProductID AttributeID AttributeValueID
1 1 1 1
2 1 2 4
3 1 3 9
4 2 1 2
5 2 3 8
6 3 3 8

Does this m

Solution

What's the question? Does this make sense? Yea, sure.

  1. A product could have more than one attribute Fine



  1. A product could have more than one of the same attribute. A shirt is both 'brown' and 'green' Um. If you say so. Sounds a bit strange but that's up to you, I guess.



  1. All Attributes and its Values have to be pre-defined (loaded from text file) Again, sounds a bit strange that numerical fields like weight and length have to have pre-defined values. I personally go for a table of pre-defined values and storing non-predefined values separately, but it makes the structure a LOT more complex. Up to you if you want (or need) to do it this way, really.



  1. Once the data is loaded (once every few months), there is heavy SELECTing. So the model needs to be read-optimized. If it's properly indexed then the fact that you've got a squillion rows shouldn't impact on performance so much.



Example query to get the product name and attributes for a product:

SELECT
Product.ProductID, Product.Name,
Attribute.AttributeID, Attribute.Name,
ProductAttributes.AttributeValueID,
AttributeValue.AttributeValueID, AttributeValue.Value,

FROM
Product
JOIN ProductAttributes ON Product.ProductID = ProductAttributes.productID
JOIN AttributeValue ON ProductAttributes.AttributeValueID = AttributeValue.AttributeValueID
JOIN Attribute ON AttributeValue.AttributeID = Attribute.AttributeID

WHERE Product.ProductID = 1234


Index-wise, make sure you've got the usual primary key indexes. You'll also need an index on ProductAttributes.productID. You'll need an index on Attribute.name if you're planning on referring to this field in a query condition (eg get products which have a value for attribute 'weight')

You don't really need the AttributeID in the ProductAttributes table, as you've got the AttributeValueID, whose table then links to ProductAttributes. But having it there could conceivably act as a short cut if you were frequently getting a list of Attributes for a product without their values:

SELECT
Product.ProductID, Product.Name,
Attribute.AttributeID, Attribute.Name
FROM
Product
JOIN ProductAttributes ON Product.ProductID = ProductAttributes.productID
--THIS JOIN IS NOW NOT REQUIRED...  JOIN AttributeValue ON ProductAttributes.AttributeValueID = AttributeValue.AttributeValueID
--THIS JOIN CAN BE REPLACED BY THE ONE BELOW...  JOIN Attribute ON AttributeValue.AttributeID = Attribute.AttributeID
JOIN Attribute on ProductAttributes.AttributeID = Attribute.AttributeID
WHERE Product.ProductID = 1234


@Colin'tHart - you may be right about it not being a good design, but you'll need to explain why for your argument to have any weight.

Code Snippets

SELECT
Product.ProductID, Product.Name,
Attribute.AttributeID, Attribute.Name,
ProductAttributes.AttributeValueID,
AttributeValue.AttributeValueID, AttributeValue.Value,

FROM
Product
JOIN ProductAttributes ON Product.ProductID = ProductAttributes.productID
JOIN AttributeValue ON ProductAttributes.AttributeValueID = AttributeValue.AttributeValueID
JOIN Attribute ON AttributeValue.AttributeID = Attribute.AttributeID

WHERE Product.ProductID = 1234
SELECT
Product.ProductID, Product.Name,
Attribute.AttributeID, Attribute.Name
FROM
Product
JOIN ProductAttributes ON Product.ProductID = ProductAttributes.productID
--THIS JOIN IS NOW NOT REQUIRED...  JOIN AttributeValue ON ProductAttributes.AttributeValueID = AttributeValue.AttributeValueID
--THIS JOIN CAN BE REPLACED BY THE ONE BELOW...  JOIN Attribute ON AttributeValue.AttributeID = Attribute.AttributeID
JOIN Attribute on ProductAttributes.AttributeID = Attribute.AttributeID
WHERE Product.ProductID = 1234

Context

StackExchange Database Administrators Q#71655, answer score: 2

Revisions (0)

No revisions yet.