HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlMinor

Need help in picking solution and understanding when to add columns vs EAV

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
understandingeavpickingneedcolumnshelpwhenandsolutionadd

Problem

I've been working on the following problem for a long time and I can't decide between two solutions. While both would work to solve the problem, neither "feels" exactly right to me.

Any advice on this would be greatly appreciated. Also note that I'm not particularly great with DB design so any feedback is appreciated.

I have the following table:

ProductYearDistrictCrop (+-2 million records)

oid | productId | yearId | districId     | cropId | payDate     | 
1   | 1         | 1      | 1             |1       | 2018-01-01  |
2   | 1         | 1      | 1             |1       | 2018-02-01  |
3   | 2         | 1      | 1             |1       | 2018-03-01  |
4   | 2         | 1      | 1             |1       | 2018-04-01  |


Some background info on the table:

A record is created for each possible combination of Product - Year - District - Crop and there is a unique constraint on those Foreign Keys (the combination must be unique). I opted to use Oid as the PK instead of a Composite Key because Oid can then be used by other tables as a FK and I assumed a lookup on the Oid would be quicker than a composite key.

For each of these records, there are additional options that can be picked from. These options come from the below table

Options (16 records)

oid | description   | systemDefaultValue
1   | Option 1      | 0.5 
2   | Option 2      | 2.5
3   | Option 3      | 1.5
4   | Option 4      | 1.5


For each of these choices I need to store a custom value or just reuse the systemDefaultValue per record in ProductYearDistrictCrop

The two possible solutions I come up with are

Solution A : Simple add each possible option as column to the table

```
oid | productId | yearId | districId | cropId | payDate | Option1 | Option2 |
1 | 1 | 1 | 1 |1 | 2018-01-01 | 0.5 | 2.5 |
2 | 1 | 1 | 1 |1 | 2018-02-01 | 0.5 | 2.5 |
3 | 2 | 1 | 1

Solution

Richard,

I really think this is a great question you have posed. My answer got longer than I anticipated so in summary I have 3 possible solutions based on what you asked. If I had to rank my preference I would do:

  • EVA table



  • Multiple Child Tables



  • Single "Extension" table



EVA Approach

Thanks to some insight from Jon the EVA is certainly a very viable approach with proper indexing using a PIVOT

For the sake of the example on the PIVOT (since I was not familiar with it) I created a simple EAV table.

DECLARE @Table TABLE
(
    ID INT NOT NULL,
    OptionID INT NOT NULL,
    [Value] INT NULL
)


I then populated it with 16 unique ID's and each configured for 16 different OptionID's. The value is populated with the product of ID and OptionID If those two values are the same then no value was provided (If ID

  • Space savings since records can only be added if they differ than the default and that can be applied via a CASE statement.



  • EAV tables are really narrow, so as long as the table is indexed well performance should be really good.



Extension Table Approach

When It Comes To The Additional Columns Approach most people's concern is that it isn't normalized out. I hear a comment one time about "Normalize until it hurts. De-Normalize until it works". (There is actually a DBA Stack Exchange topic which covered this idea about how far to go with Normalization). With all that in mind if this is going to be a highly utilized part of the database than this may be a candidate where taking a slightly de-normalized design, for performance reasons may be what you want to do.

I would not however recommend adding the columns onto your existing table. The wider the table is, the less data you can fit onto the same page which means that everything else using this table will take a performance degradation. One way around that is creating an "Extension" table which doesn't do anything but hold these values. You would simply JOIN to this one table to get all 16 options (whether they are always written, or the columns are NULLABLE and only hold have a value if it differs from the default). When an additional Option is added, you would just add a new column to this table.

This table, space-wise would be really inefficient. But we would be sacrificing that space, for SELECT query performance in both instances where ProductYearDistrictCrop is used without this new extension table and when ProductYearDistrictCrop is used in unison with it.

CREATE TABLE [dbo].[ProductYearDistrictCropOptions]
(
    [Oid] [BIGINT] NOT NULL,
    [Option1] [Decimal(10,5)] NULL, --Or Whatever the appropriate datatype is
    [Option2] [Decimal(10,5)] NULL, 
    [Option3] [Decimal(10,5)] NULL, 
    [Option4] [Decimal(10,5)] NULL, 
    [Option5] [Decimal(10,5)] NULL, 
    [Option6] [Decimal(10,5)] NULL, 
    [Option7] [Decimal(10,5)] NULL, 
    [Option8] [Decimal(10,5)] NULL, 
    [Option9] [Decimal(10,5)] NULL, 
    [Option10] [Decimal(10,5)] NULL, 
    [Option11] [Decimal(10,5)] NULL, 
    [Option12] [Decimal(10,5)] NULL, 
    [Option13] [Decimal(10,5)] NULL, 
    [Option14] [Decimal(10,5)] NULL, 
    [Option15] [Decimal(10,5)] NULL, 
    [Option16] [Decimal(10,5)] NULL, 

    CONSTRAINT [PK_ProductYearDistrictCropOptions] PRIMARY KEY CLUSTERED 
    (
        [Oid] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE [dbo].[ProductYearDistrictCropOptions]  WITH CHECK ADD  CONSTRAINT [FK_ProductYearDistrictCropOptions_ProductYearDistrictCrop] FOREIGN KEY([Oid])
REFERENCES [dbo].[ProductYearDistrictCrop] ([Oid])
GO

ALTER TABLE [dbo].[ProductYearDistrictCropOptions] CHECK CONSTRAINT [FK_ProductYearDistrictCropOptions_ProductYearDistrictCrop]
GO


Multiple Child Table Approach

You can get around the poor space utilization with multiple child tables

I was reading an article the other day reference in a different DBA Stack Exchange Question on NULLS. This Article suggested that multiple child tables holding values can be used to designate this. Where each option has its own child table which holds this data. You would end up creating 16 version of something like:

```
CREATE TABLE [dbo].[ProductYearDistrictCropOption1Values]
(
[Oid] [BIGINT] NOT NULL,
[Value] [Decimal(10,5)] NOT NULL, --Or Whatever the appropriate datatype is

CONSTRAINT [PK_ProductYearDistrictCropOption1Values] PRIMARY KEY CLUSTERED
(
[Oid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE [dbo].[ProductYearDistrictCropOption1Values] WITH CHECK ADD CONSTRAINT [FK_ProductYearDistrictCropOption1Values_ProductYearDistrictCrop] FOREIGN KEY([Oid])
REFERENCES [dbo].[ProductYearDistrictCrop] ([Oid])
GO

ALTER TABLE

Code Snippets

DECLARE @Table TABLE
(
    ID INT NOT NULL,
    OptionID INT NOT NULL,
    [Value] INT NULL
)
INSERT INTO @Table (ID, OptionID, Value)
VALUES (1, 1, NULL),
(1, 2, 2),
(1, 3, 3),
(1, 4, 4),
(1, 5, 5),
(1, 6, 6),
(1, 7, 7),
(1, 8, 8),
(1, 9, 9),
(1, 10, 10),
(1, 11, 11),
(1, 12, 12),
(1, 13, 13),
(1, 14, 14),
(1, 15, 15),
(1, 16, 16),
(2, 1, 2),
(2, 2, NULL),
(2, 3, 6),
(2, 4, 8),
(2, 5, 10),
(2, 6, 12),
(2, 7, 14),
(2, 8, 16),
(2, 9, 18),
(2, 10, 20),
(2, 11, 22),
(2, 12, 24),
(2, 13, 26),
(2, 14, 28),
(2, 15, 30),
(2, 16, 32),
(3, 1, 3),
(3, 2, 6),
(3, 3, NULL),
(3, 4, 12),
(3, 5, 15),
(3, 6, 18),
(3, 7, 21),
(3, 8, 24),
(3, 9, 27),
(3, 10, 30),
(3, 11, 33),
(3, 12, 36),
(3, 13, 39),
(3, 14, 42),
(3, 15, 45),
(3, 16, 48),
(4, 1, 4),
(4, 2, 8),
(4, 3, 12),
(4, 4, NULL),
(4, 5, 20),
(4, 6, 24),
(4, 7, 28),
(4, 8, 32),
(4, 9, 36),
(4, 10, 40),
(4, 11, 44),
(4, 12, 48),
(4, 13, 52),
(4, 14, 56),
(4, 15, 60),
(4, 16, 64),
(5, 1, 5),
(5, 2, 10),
(5, 3, 15),
(5, 4, 20),
(5, 5, NULL),
(5, 6, 30),
(5, 7, 35),
(5, 8, 40),
(5, 9, 45),
(5, 10, 50),
(5, 11, 55),
(5, 12, 60),
(5, 13, 65),
(5, 14, 70),
(5, 15, 75),
(5, 16, 80),
(6, 1, 6),
(6, 2, 12),
(6, 3, 18),
(6, 4, 24),
(6, 5, 30),
(6, 6, NULL),
(6, 7, 42),
(6, 8, 48),
(6, 9, 54),
(6, 10, 60),
(6, 11, 66),
(6, 12, 72),
(6, 13, 78),
(6, 14, 84),
(6, 15, 90),
(6, 16, 96),
(7, 1, 7),
(7, 2, 14),
(7, 3, 21),
(7, 4, 28),
(7, 5, 35),
(7, 6, 42),
--(7, 7, NULL),
(7, 8, 56),
(7, 9, 63),
(7, 10, 70),
(7, 11, 77),
(7, 12, 84),
(7, 13, 91),
(7, 14, 98),
(7, 15, 105),
(7, 16, 112),
(8, 1, 8),
(8, 2, 16),
(8, 3, 24),
(8, 4, 32),
(8, 5, 40),
(8, 6, 48),
(8, 7, 56),
--(8, 8, NULL),
(8, 9, 72),
(8, 10, 80),
(8, 11, 88),
(8, 12, 96),
(8, 13, 104),
(8, 14, 112),
(8, 15, 120),
(8, 16, 128),
(9, 1, 9),
(9, 2, 18),
(9, 3, 27),
(9, 4, 36),
(9, 5, 45),
(9, 6, 54),
(9, 7, 63),
(9, 8, 72),
--(9, 9, NULL),
(9, 10, 90),
(9, 11, 99),
(9, 12, 108),
(9, 13, 117),
(9, 14, 126),
(9, 15, 135),
(9, 16, 144),
(10, 1, 10),
(10, 2, 20),
(10, 3, 30),
(10, 4, 40),
(10, 5, 50),
(10, 6, 60),
(10, 7, 70),
(10, 8, 80),
(10, 9, 90),
--(10, 10, NULL),
(10, 11, 110),
(10, 12, 120),
(10, 13, 130),
(10, 14, 140),
(10, 15, 150),
(10, 16, 160),
(11, 1, 11),
(11, 2, 22),
(11, 3, 33),
(11, 4, 44),
(11, 5, 55),
(11, 6, 66),
(11, 7, 77),
(11, 8, 88),
(11, 9, 99),
(11, 10, 110),
--(11, 11, NULL),
(11, 12, 132),
(11, 13, 143),
(11, 14, 154),
(11, 15, 165),
(11, 16, 176),
(12, 1, 12),
(12, 2, 24),
(12, 3, 36),
(12, 4, 48),
(12, 5, 60),
(12, 6, 72),
(12, 7, 84),
(12, 8, 96),
(12, 9, 108),
(12, 10, 120),
(12, 11, 132),
--(12, 12, NULL),
(12, 13, 156),
(12, 14, 168),
(12, 15, 180),
(12, 16, 192),
(13, 1, 13),
(13, 2, 26),
(13, 3, 39),
(13, 4, 52),
(13, 5, 65),
(13, 6, 78),
(13, 7, 91),
(13, 8, 104),
(13, 9, 117),
(13, 10, 130),
(13, 11, 143),
(13, 12, 156),
--(13, 13, NULL),
(13, 14, 182),
(13, 15, 195),
(13, 16, 208),
(14, 1, 14),
(14, 2, 28),
(14, 3, 42),
(14, 4, 56),
(14, 5, 70),
(14, 6, 84),
(14, 7, 98),
(14, 8, 112),
(14, 9, 126),
(14, 10, 140),
(14, 11, 154),
(14, 12, 168),
(14, 13, 182),
--(14, 14, NULL),
(14, 15, 210),
(14, 16, 224),
(15, 1, 15),
(15
SELECT ID, 
[1] AS Option1, --SUM([Value]) where OptionID = 1 Grouped By ID
[2] AS Option2, --SUM([Value]) where OptionID = 2 Grouped By ID
[3] AS Option3, --SUM([Value]) where OptionID = 3 Grouped By ID
[4] AS Option4, --SUM([Value]) where OptionID = 4 Grouped By ID
[5] AS Option5, --SUM([Value]) where OptionID = 5 Grouped By ID
[6] AS Option6, --SUM([Value]) where OptionID = 6 Grouped By ID
[7] AS Option7, --SUM([Value]) where OptionID = 7 Grouped By ID
[8] AS Option8, --SUM([Value]) where OptionID = 8 Grouped By ID
[9] AS Option9, --SUM([Value]) where OptionID = 9 Grouped By ID
[10] AS Option10, --SUM([Value]) where OptionID = 10 Grouped By ID
[11] AS Option11, --SUM([Value]) where OptionID = 11 Grouped By ID
[12] AS Option12, --SUM([Value]) where OptionID = 12 Grouped By ID
[13] AS Option13, --SUM([Value]) where OptionID = 13 Grouped By ID
[14] AS Option14, --SUM([Value]) where OptionID = 14 Grouped By ID
[15] AS Option15, --SUM([Value]) where OptionID = 15 Grouped By ID
[16] AS Option16  --SUM([Value]) where OptionID = 16 Grouped By ID
FROM 
(
    SELECT ID, OptionID, [Value]
    FROM @Table
) UP
PIVOT (
        SUM([Value]) 
        FOR [OptionID] IN([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16]) --Values held in the OptionID column
    ) AS pvt
ORDER BY ID
CREATE TABLE [dbo].[ProductYearDistrictCropOptions]
(
    [Oid] [BIGINT] NOT NULL,
    [Option1] [Decimal(10,5)] NULL, --Or Whatever the appropriate datatype is
    [Option2] [Decimal(10,5)] NULL, 
    [Option3] [Decimal(10,5)] NULL, 
    [Option4] [Decimal(10,5)] NULL, 
    [Option5] [Decimal(10,5)] NULL, 
    [Option6] [Decimal(10,5)] NULL, 
    [Option7] [Decimal(10,5)] NULL, 
    [Option8] [Decimal(10,5)] NULL, 
    [Option9] [Decimal(10,5)] NULL, 
    [Option10] [Decimal(10,5)] NULL, 
    [Option11] [Decimal(10,5)] NULL, 
    [Option12] [Decimal(10,5)] NULL, 
    [Option13] [Decimal(10,5)] NULL, 
    [Option14] [Decimal(10,5)] NULL, 
    [Option15] [Decimal(10,5)] NULL, 
    [Option16] [Decimal(10,5)] NULL, 

    CONSTRAINT [PK_ProductYearDistrictCropOptions] PRIMARY KEY CLUSTERED 
    (
        [Oid] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE [dbo].[ProductYearDistrictCropOptions]  WITH CHECK ADD  CONSTRAINT [FK_ProductYearDistrictCropOptions_ProductYearDistrictCrop] FOREIGN KEY([Oid])
REFERENCES [dbo].[ProductYearDistrictCrop] ([Oid])
GO

ALTER TABLE [dbo].[ProductYearDistrictCropOptions] CHECK CONSTRAINT [FK_ProductYearDistrictCropOptions_ProductYearDistrictCrop]
GO
CREATE TABLE [dbo].[ProductYearDistrictCropOption1Values]
(
    [Oid] [BIGINT] NOT NULL,
    [Value] [Decimal(10,5)] NOT NULL, --Or Whatever the appropriate datatype is

    CONSTRAINT [PK_ProductYearDistrictCropOption1Values] PRIMARY KEY CLUSTERED 
    (
        [Oid] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE [dbo].[ProductYearDistrictCropOption1Values]  WITH CHECK ADD  CONSTRAINT [FK_ProductYearDistrictCropOption1Values_ProductYearDistrictCrop] FOREIGN KEY([Oid])
REFERENCES [dbo].[ProductYearDistrictCrop] ([Oid])
GO

ALTER TABLE [dbo].[ProductYearDistrictCropOption1Values] CHECK CONSTRAINT [FK_ProductYearDistrictCropOption1Values_ProductYearDistrictCrop]

Context

StackExchange Database Administrators Q#241341, answer score: 2

Revisions (0)

No revisions yet.