principlesqlMinor
Need help in picking solution and understanding when to add columns vs EAV
Viewed 0 times
understandingeavpickingneedcolumnshelpwhenandsolutionadd
Problem
I've been working on the following problem for a long time and I can't decide between two solutions. While both would work to solve the problem, neither "feels" exactly right to me.
Any advice on this would be greatly appreciated. Also note that I'm not particularly great with DB design so any feedback is appreciated.
I have the following table:
ProductYearDistrictCrop (+-2 million records)
Some background info on the table:
A record is created for each possible combination of Product - Year - District - Crop and there is a unique constraint on those Foreign Keys (the combination must be unique). I opted to use Oid as the PK instead of a Composite Key because Oid can then be used by other tables as a FK and I assumed a lookup on the Oid would be quicker than a composite key.
For each of these records, there are additional options that can be picked from. These options come from the below table
Options (16 records)
For each of these choices I need to store a custom value or just reuse the systemDefaultValue per record in ProductYearDistrictCrop
The two possible solutions I come up with are
Solution A : Simple add each possible option as column to the table
```
oid | productId | yearId | districId | cropId | payDate | Option1 | Option2 |
1 | 1 | 1 | 1 |1 | 2018-01-01 | 0.5 | 2.5 |
2 | 1 | 1 | 1 |1 | 2018-02-01 | 0.5 | 2.5 |
3 | 2 | 1 | 1
Any advice on this would be greatly appreciated. Also note that I'm not particularly great with DB design so any feedback is appreciated.
I have the following table:
ProductYearDistrictCrop (+-2 million records)
oid | productId | yearId | districId | cropId | payDate |
1 | 1 | 1 | 1 |1 | 2018-01-01 |
2 | 1 | 1 | 1 |1 | 2018-02-01 |
3 | 2 | 1 | 1 |1 | 2018-03-01 |
4 | 2 | 1 | 1 |1 | 2018-04-01 |Some background info on the table:
A record is created for each possible combination of Product - Year - District - Crop and there is a unique constraint on those Foreign Keys (the combination must be unique). I opted to use Oid as the PK instead of a Composite Key because Oid can then be used by other tables as a FK and I assumed a lookup on the Oid would be quicker than a composite key.
For each of these records, there are additional options that can be picked from. These options come from the below table
Options (16 records)
oid | description | systemDefaultValue
1 | Option 1 | 0.5
2 | Option 2 | 2.5
3 | Option 3 | 1.5
4 | Option 4 | 1.5For each of these choices I need to store a custom value or just reuse the systemDefaultValue per record in ProductYearDistrictCrop
The two possible solutions I come up with are
Solution A : Simple add each possible option as column to the table
```
oid | productId | yearId | districId | cropId | payDate | Option1 | Option2 |
1 | 1 | 1 | 1 |1 | 2018-01-01 | 0.5 | 2.5 |
2 | 1 | 1 | 1 |1 | 2018-02-01 | 0.5 | 2.5 |
3 | 2 | 1 | 1
Solution
Richard,
I really think this is a great question you have posed. My answer got longer than I anticipated so in summary I have 3 possible solutions based on what you asked. If I had to rank my preference I would do:
EVA Approach
Thanks to some insight from Jon the
For the sake of the example on the
I then populated it with 16 unique
Extension Table Approach
When It Comes To The Additional Columns Approach most people's concern is that it isn't normalized out. I hear a comment one time about "Normalize until it hurts. De-Normalize until it works". (There is actually a DBA Stack Exchange topic which covered this idea about how far to go with
I would not however recommend adding the columns onto your existing table. The wider the table is, the less data you can fit onto the same page which means that everything else using this table will take a performance degradation. One way around that is creating an "Extension"
This table, space-wise would be really inefficient. But we would be sacrificing that space, for
Multiple Child Table Approach
You can get around the poor space utilization with multiple
I was reading an article the other day reference in a different DBA Stack Exchange Question on NULLS. This Article suggested that multiple child tables holding values can be used to designate this. Where each
```
CREATE TABLE [dbo].[ProductYearDistrictCropOption1Values]
(
[Oid] [BIGINT] NOT NULL,
[Value] [Decimal(10,5)] NOT NULL, --Or Whatever the appropriate datatype is
CONSTRAINT [PK_ProductYearDistrictCropOption1Values] PRIMARY KEY CLUSTERED
(
[Oid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[ProductYearDistrictCropOption1Values] WITH CHECK ADD CONSTRAINT [FK_ProductYearDistrictCropOption1Values_ProductYearDistrictCrop] FOREIGN KEY([Oid])
REFERENCES [dbo].[ProductYearDistrictCrop] ([Oid])
GO
ALTER TABLE
I really think this is a great question you have posed. My answer got longer than I anticipated so in summary I have 3 possible solutions based on what you asked. If I had to rank my preference I would do:
EVAtable
- Multiple Child Tables
- Single "Extension" table
EVA Approach
Thanks to some insight from Jon the
EVA is certainly a very viable approach with proper indexing using a PIVOTFor the sake of the example on the
PIVOT (since I was not familiar with it) I created a simple EAV table.DECLARE @Table TABLE
(
ID INT NOT NULL,
OptionID INT NOT NULL,
[Value] INT NULL
)I then populated it with 16 unique
ID's and each configured for 16 different OptionID's. The value is populated with the product of ID and OptionID If those two values are the same then no value was provided (If ID - Space savings since records can only be added if they differ than the default and that can be applied via a
CASEstatement.
EAVtables are really narrow, so as long as the table is indexed well performance should be really good.
Extension Table Approach
When It Comes To The Additional Columns Approach most people's concern is that it isn't normalized out. I hear a comment one time about "Normalize until it hurts. De-Normalize until it works". (There is actually a DBA Stack Exchange topic which covered this idea about how far to go with
Normalization). With all that in mind if this is going to be a highly utilized part of the database than this may be a candidate where taking a slightly de-normalized design, for performance reasons may be what you want to do. I would not however recommend adding the columns onto your existing table. The wider the table is, the less data you can fit onto the same page which means that everything else using this table will take a performance degradation. One way around that is creating an "Extension"
table which doesn't do anything but hold these values. You would simply JOIN to this one table to get all 16 options (whether they are always written, or the columns are NULLABLE and only hold have a value if it differs from the default). When an additional Option is added, you would just add a new column to this table.This table, space-wise would be really inefficient. But we would be sacrificing that space, for
SELECT query performance in both instances where ProductYearDistrictCrop is used without this new extension table and when ProductYearDistrictCrop is used in unison with it.CREATE TABLE [dbo].[ProductYearDistrictCropOptions]
(
[Oid] [BIGINT] NOT NULL,
[Option1] [Decimal(10,5)] NULL, --Or Whatever the appropriate datatype is
[Option2] [Decimal(10,5)] NULL,
[Option3] [Decimal(10,5)] NULL,
[Option4] [Decimal(10,5)] NULL,
[Option5] [Decimal(10,5)] NULL,
[Option6] [Decimal(10,5)] NULL,
[Option7] [Decimal(10,5)] NULL,
[Option8] [Decimal(10,5)] NULL,
[Option9] [Decimal(10,5)] NULL,
[Option10] [Decimal(10,5)] NULL,
[Option11] [Decimal(10,5)] NULL,
[Option12] [Decimal(10,5)] NULL,
[Option13] [Decimal(10,5)] NULL,
[Option14] [Decimal(10,5)] NULL,
[Option15] [Decimal(10,5)] NULL,
[Option16] [Decimal(10,5)] NULL,
CONSTRAINT [PK_ProductYearDistrictCropOptions] PRIMARY KEY CLUSTERED
(
[Oid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[ProductYearDistrictCropOptions] WITH CHECK ADD CONSTRAINT [FK_ProductYearDistrictCropOptions_ProductYearDistrictCrop] FOREIGN KEY([Oid])
REFERENCES [dbo].[ProductYearDistrictCrop] ([Oid])
GO
ALTER TABLE [dbo].[ProductYearDistrictCropOptions] CHECK CONSTRAINT [FK_ProductYearDistrictCropOptions_ProductYearDistrictCrop]
GOMultiple Child Table Approach
You can get around the poor space utilization with multiple
child tablesI was reading an article the other day reference in a different DBA Stack Exchange Question on NULLS. This Article suggested that multiple child tables holding values can be used to designate this. Where each
option has its own child table which holds this data. You would end up creating 16 version of something like:```
CREATE TABLE [dbo].[ProductYearDistrictCropOption1Values]
(
[Oid] [BIGINT] NOT NULL,
[Value] [Decimal(10,5)] NOT NULL, --Or Whatever the appropriate datatype is
CONSTRAINT [PK_ProductYearDistrictCropOption1Values] PRIMARY KEY CLUSTERED
(
[Oid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[ProductYearDistrictCropOption1Values] WITH CHECK ADD CONSTRAINT [FK_ProductYearDistrictCropOption1Values_ProductYearDistrictCrop] FOREIGN KEY([Oid])
REFERENCES [dbo].[ProductYearDistrictCrop] ([Oid])
GO
ALTER TABLE
Code Snippets
DECLARE @Table TABLE
(
ID INT NOT NULL,
OptionID INT NOT NULL,
[Value] INT NULL
)INSERT INTO @Table (ID, OptionID, Value)
VALUES (1, 1, NULL),
(1, 2, 2),
(1, 3, 3),
(1, 4, 4),
(1, 5, 5),
(1, 6, 6),
(1, 7, 7),
(1, 8, 8),
(1, 9, 9),
(1, 10, 10),
(1, 11, 11),
(1, 12, 12),
(1, 13, 13),
(1, 14, 14),
(1, 15, 15),
(1, 16, 16),
(2, 1, 2),
(2, 2, NULL),
(2, 3, 6),
(2, 4, 8),
(2, 5, 10),
(2, 6, 12),
(2, 7, 14),
(2, 8, 16),
(2, 9, 18),
(2, 10, 20),
(2, 11, 22),
(2, 12, 24),
(2, 13, 26),
(2, 14, 28),
(2, 15, 30),
(2, 16, 32),
(3, 1, 3),
(3, 2, 6),
(3, 3, NULL),
(3, 4, 12),
(3, 5, 15),
(3, 6, 18),
(3, 7, 21),
(3, 8, 24),
(3, 9, 27),
(3, 10, 30),
(3, 11, 33),
(3, 12, 36),
(3, 13, 39),
(3, 14, 42),
(3, 15, 45),
(3, 16, 48),
(4, 1, 4),
(4, 2, 8),
(4, 3, 12),
(4, 4, NULL),
(4, 5, 20),
(4, 6, 24),
(4, 7, 28),
(4, 8, 32),
(4, 9, 36),
(4, 10, 40),
(4, 11, 44),
(4, 12, 48),
(4, 13, 52),
(4, 14, 56),
(4, 15, 60),
(4, 16, 64),
(5, 1, 5),
(5, 2, 10),
(5, 3, 15),
(5, 4, 20),
(5, 5, NULL),
(5, 6, 30),
(5, 7, 35),
(5, 8, 40),
(5, 9, 45),
(5, 10, 50),
(5, 11, 55),
(5, 12, 60),
(5, 13, 65),
(5, 14, 70),
(5, 15, 75),
(5, 16, 80),
(6, 1, 6),
(6, 2, 12),
(6, 3, 18),
(6, 4, 24),
(6, 5, 30),
(6, 6, NULL),
(6, 7, 42),
(6, 8, 48),
(6, 9, 54),
(6, 10, 60),
(6, 11, 66),
(6, 12, 72),
(6, 13, 78),
(6, 14, 84),
(6, 15, 90),
(6, 16, 96),
(7, 1, 7),
(7, 2, 14),
(7, 3, 21),
(7, 4, 28),
(7, 5, 35),
(7, 6, 42),
--(7, 7, NULL),
(7, 8, 56),
(7, 9, 63),
(7, 10, 70),
(7, 11, 77),
(7, 12, 84),
(7, 13, 91),
(7, 14, 98),
(7, 15, 105),
(7, 16, 112),
(8, 1, 8),
(8, 2, 16),
(8, 3, 24),
(8, 4, 32),
(8, 5, 40),
(8, 6, 48),
(8, 7, 56),
--(8, 8, NULL),
(8, 9, 72),
(8, 10, 80),
(8, 11, 88),
(8, 12, 96),
(8, 13, 104),
(8, 14, 112),
(8, 15, 120),
(8, 16, 128),
(9, 1, 9),
(9, 2, 18),
(9, 3, 27),
(9, 4, 36),
(9, 5, 45),
(9, 6, 54),
(9, 7, 63),
(9, 8, 72),
--(9, 9, NULL),
(9, 10, 90),
(9, 11, 99),
(9, 12, 108),
(9, 13, 117),
(9, 14, 126),
(9, 15, 135),
(9, 16, 144),
(10, 1, 10),
(10, 2, 20),
(10, 3, 30),
(10, 4, 40),
(10, 5, 50),
(10, 6, 60),
(10, 7, 70),
(10, 8, 80),
(10, 9, 90),
--(10, 10, NULL),
(10, 11, 110),
(10, 12, 120),
(10, 13, 130),
(10, 14, 140),
(10, 15, 150),
(10, 16, 160),
(11, 1, 11),
(11, 2, 22),
(11, 3, 33),
(11, 4, 44),
(11, 5, 55),
(11, 6, 66),
(11, 7, 77),
(11, 8, 88),
(11, 9, 99),
(11, 10, 110),
--(11, 11, NULL),
(11, 12, 132),
(11, 13, 143),
(11, 14, 154),
(11, 15, 165),
(11, 16, 176),
(12, 1, 12),
(12, 2, 24),
(12, 3, 36),
(12, 4, 48),
(12, 5, 60),
(12, 6, 72),
(12, 7, 84),
(12, 8, 96),
(12, 9, 108),
(12, 10, 120),
(12, 11, 132),
--(12, 12, NULL),
(12, 13, 156),
(12, 14, 168),
(12, 15, 180),
(12, 16, 192),
(13, 1, 13),
(13, 2, 26),
(13, 3, 39),
(13, 4, 52),
(13, 5, 65),
(13, 6, 78),
(13, 7, 91),
(13, 8, 104),
(13, 9, 117),
(13, 10, 130),
(13, 11, 143),
(13, 12, 156),
--(13, 13, NULL),
(13, 14, 182),
(13, 15, 195),
(13, 16, 208),
(14, 1, 14),
(14, 2, 28),
(14, 3, 42),
(14, 4, 56),
(14, 5, 70),
(14, 6, 84),
(14, 7, 98),
(14, 8, 112),
(14, 9, 126),
(14, 10, 140),
(14, 11, 154),
(14, 12, 168),
(14, 13, 182),
--(14, 14, NULL),
(14, 15, 210),
(14, 16, 224),
(15, 1, 15),
(15SELECT ID,
[1] AS Option1, --SUM([Value]) where OptionID = 1 Grouped By ID
[2] AS Option2, --SUM([Value]) where OptionID = 2 Grouped By ID
[3] AS Option3, --SUM([Value]) where OptionID = 3 Grouped By ID
[4] AS Option4, --SUM([Value]) where OptionID = 4 Grouped By ID
[5] AS Option5, --SUM([Value]) where OptionID = 5 Grouped By ID
[6] AS Option6, --SUM([Value]) where OptionID = 6 Grouped By ID
[7] AS Option7, --SUM([Value]) where OptionID = 7 Grouped By ID
[8] AS Option8, --SUM([Value]) where OptionID = 8 Grouped By ID
[9] AS Option9, --SUM([Value]) where OptionID = 9 Grouped By ID
[10] AS Option10, --SUM([Value]) where OptionID = 10 Grouped By ID
[11] AS Option11, --SUM([Value]) where OptionID = 11 Grouped By ID
[12] AS Option12, --SUM([Value]) where OptionID = 12 Grouped By ID
[13] AS Option13, --SUM([Value]) where OptionID = 13 Grouped By ID
[14] AS Option14, --SUM([Value]) where OptionID = 14 Grouped By ID
[15] AS Option15, --SUM([Value]) where OptionID = 15 Grouped By ID
[16] AS Option16 --SUM([Value]) where OptionID = 16 Grouped By ID
FROM
(
SELECT ID, OptionID, [Value]
FROM @Table
) UP
PIVOT (
SUM([Value])
FOR [OptionID] IN([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16]) --Values held in the OptionID column
) AS pvt
ORDER BY IDCREATE TABLE [dbo].[ProductYearDistrictCropOptions]
(
[Oid] [BIGINT] NOT NULL,
[Option1] [Decimal(10,5)] NULL, --Or Whatever the appropriate datatype is
[Option2] [Decimal(10,5)] NULL,
[Option3] [Decimal(10,5)] NULL,
[Option4] [Decimal(10,5)] NULL,
[Option5] [Decimal(10,5)] NULL,
[Option6] [Decimal(10,5)] NULL,
[Option7] [Decimal(10,5)] NULL,
[Option8] [Decimal(10,5)] NULL,
[Option9] [Decimal(10,5)] NULL,
[Option10] [Decimal(10,5)] NULL,
[Option11] [Decimal(10,5)] NULL,
[Option12] [Decimal(10,5)] NULL,
[Option13] [Decimal(10,5)] NULL,
[Option14] [Decimal(10,5)] NULL,
[Option15] [Decimal(10,5)] NULL,
[Option16] [Decimal(10,5)] NULL,
CONSTRAINT [PK_ProductYearDistrictCropOptions] PRIMARY KEY CLUSTERED
(
[Oid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[ProductYearDistrictCropOptions] WITH CHECK ADD CONSTRAINT [FK_ProductYearDistrictCropOptions_ProductYearDistrictCrop] FOREIGN KEY([Oid])
REFERENCES [dbo].[ProductYearDistrictCrop] ([Oid])
GO
ALTER TABLE [dbo].[ProductYearDistrictCropOptions] CHECK CONSTRAINT [FK_ProductYearDistrictCropOptions_ProductYearDistrictCrop]
GOCREATE TABLE [dbo].[ProductYearDistrictCropOption1Values]
(
[Oid] [BIGINT] NOT NULL,
[Value] [Decimal(10,5)] NOT NULL, --Or Whatever the appropriate datatype is
CONSTRAINT [PK_ProductYearDistrictCropOption1Values] PRIMARY KEY CLUSTERED
(
[Oid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE [dbo].[ProductYearDistrictCropOption1Values] WITH CHECK ADD CONSTRAINT [FK_ProductYearDistrictCropOption1Values_ProductYearDistrictCrop] FOREIGN KEY([Oid])
REFERENCES [dbo].[ProductYearDistrictCrop] ([Oid])
GO
ALTER TABLE [dbo].[ProductYearDistrictCropOption1Values] CHECK CONSTRAINT [FK_ProductYearDistrictCropOption1Values_ProductYearDistrictCrop]Context
StackExchange Database Administrators Q#241341, answer score: 2
Revisions (0)
No revisions yet.