snippetsqlMinor
How to store immutable, unique, ordered, lists?
Viewed 0 times
uniquestorelistsimmutablehowordered
Problem
I have an entity with millions of instances. Each instance has to reference an ordered list of items. The lists have to be unique so no list will be stored more than once. But once created, both lists and entity instances are immutable. There will be far more entity instances than lists, and the database has to support fast insertions of entities.
So what's an insert-efficient, robust, way of storing immutable, unique, ordered lists?
Edit: The list items are simple integers, and the typical length is about 5 items. Long lists of, say 10 or 20 items are very unlikely but possible.
Edit: So far, I've considered these approaches:
1)
The entity table will have a
).
2)
The entity table will have a
So what's an insert-efficient, robust, way of storing immutable, unique, ordered lists?
Edit: The list items are simple integers, and the typical length is about 5 items. Long lists of, say 10 or 20 items are very unlikely but possible.
Edit: So far, I've considered these approaches:
1)
lists table has these columns: so if list #5 contains the elements [10,20,30] the table will contain:5 1 10
5 2 20
5 3 30The entity table will have a
item_list_id column that references the lists table (it's not a foreign key since list_id is not a unique column in the lists table - The this can be solved by adding another table with a single column that which contains all valid list_ids).
- This solution makes inserts a bit tricky
- It also places the responsibility for enforcing the uniqueness of lists on the application, which isn't great.
2)
lists table has these columns: ... so if list #5 contains the elements [10,20,30] the table will contain:5 10 20 30The entity table will have a
item_list_id column that references the lists table.- This solution is less robust since list lengths are limited (although this isn't a huge problem for me since my lists are VERY unlikely to contain more than 10 or 20 elements)
- This approach is quite horrible to query ("find all lists in which a particular item appears" has to specify each and every column), and a nightmare to map using an ORM.
- Insertions of new entities is not too bad since my lists are typically 4-5 items long, so I can probably index the first few columns.
- Enforcement of uniqueness is still
Solution
I would go with a traditional set-based approach for this. Create entity, list, entityList, and listItem tables to store your data. Individual list and entity tables allow you to create lists without entities and vice versa. Constraints on these tables such as unique constraints and foreign keys will help preserve the integrity of the data.
For the list, create a single stored procedure which is the single point for inserting lists and listItems. This proc has logic to detect if a list already exists and serves as a gateway. This is straightforward to do using set operators like EXCEPT.
Here's a simple demo which shows how something like this might work. I'm using a table-valued parameter (TVP) to handle passing around lists and a scalar function to determine if a list already exists. Scalar functions are evil when used in resultsets (eg SELECT, WHERE clauses, JOIN ) but are ok when used for what they are meant for, returning single values. Spend some time working through it and see if it could work for you:
```
USE tempdb
GO
SET NOCOUNT ON
GO
------------------------------------------------------------------------------------------------
-- Setup START
------------------------------------------------------------------------------------------------
-- Reset
IF OBJECT_ID('dbo.entityLists') IS NOT NULL DROP TABLE dbo.entityLists
IF OBJECT_ID('dbo.listItems') IS NOT NULL DROP TABLE dbo.listItems
IF OBJECT_ID('dbo.entities') IS NOT NULL DROP TABLE dbo.entities
IF OBJECT_ID('dbo.lists') IS NOT NULL DROP TABLE dbo.lists
IF OBJECT_ID('dbo.usp_addList') IS NOT NULL DROP PROC dbo.usp_addList
IF OBJECT_ID('dbo.usf_listExists') IS NOT NULL DROP FUNCTION dbo.usf_listExists
IF EXISTS ( SELECT * FROM sys.types st INNER JOIN sys.schemas ss ON st.schema_id = ss.schema_id WHERE st.name = N'tvp_myList' AND ss.name = N'dbo')
DROP TYPE dbo.tvp_myList
GO
CREATE TABLE dbo.entities
(
entityId INT IDENTITY,
entityName VARCHAR(50) NOT NULL,
dateAdded DATETIME NOT NULL DEFAULT GETDATE(),
addedBy VARCHAR(30) NOT NULL DEFAULT SUSER_NAME(),
dateModified DATETIME NULL,
modifiedBy VARCHAR(30) NULL,
CONSTRAINT PK_entities PRIMARY KEY ( entityId ),
CONSTRAINT UK_entities__entityName UNIQUE ( entityName )
)
GO
CREATE TABLE dbo.lists
(
listId INT IDENTITY(1000,1),
listName VARCHAR(50) NOT NULL,
dateAdded DATETIME NOT NULL DEFAULT GETDATE(),
addedBy VARCHAR(30) NOT NULL DEFAULT SUSER_NAME(),
dateModified DATETIME NULL,
modifiedBy VARCHAR(30) NULL,
CONSTRAINT PK_lists PRIMARY KEY ( listId ),
CONSTRAINT UK_lists__listName UNIQUE ( listName )
)
GO
-- Although constraints might appear a little excessive, they prevent 'bad data' getting into the table,
-- whereas lesser constraints would not, eg prevent list with duplicate order
CREATE TABLE dbo.listItems
(
listItemId INT IDENTITY(100000,1),
listId INT NOT NULL,
itemOrderId INT NOT NULL,
itemValue INT NOT NULL,
CONSTRAINT PK_listItems PRIMARY KEY ( listItemId ), -- prevent list with duplicate order, value
CONSTRAINT UK_listItems__listId__itemValue UNIQUE ( listId, itemValue ), -- prevent list with duplicate value
CONSTRAINT UK_listItems__listId__itemOrderId UNIQUE ( listId, itemOrderId ), -- prevent list with duplicate order
CONSTRAINT UK_listItems UNIQUE ( listId, itemOrderId, itemValue ),
CONSTRAINT FK_listItems__listId FOREIGN KEY ( listId ) REFERENCES dbo.lists ( listId )
)
GO
CREATE INDEX _idx_listItems__itemOrderId__itemValue ON dbo.listItems ( itemOrderId, itemValue ) -- to be used in list existence check
GO
CREATE TABLE dbo.entityLists
(
entityId INT NOT NULL,
listId INT NOT NULL,
CONSTRAINT PK_entityLists PRIMARY KEY ( entityId, listId ),
CONSTRAINT UK_entityLists__entityId UNIQUE ( entityId ),
CONSTRAINT UK_entityLists__listId UNIQUE ( listId ),
CONSTRAINT FK_entityLists__entityId FOREIGN KEY ( entityId ) REFERENCES dbo.entities ( entityId ),
CONSTRAINT FK_entityLists__listId FOREIGN KEY ( listId ) REFERENCES dbo.lists ( listId )
)
GO
-- Create a type for passing lists around; TVP types and constraints mirror target table to act as a quality control
CREATE TYPE dbo.tvp_myList AS TABLE
(
itemOrderId INT NOT NULL UNIQUE, -- you could make this an identity column if you want to make sure order is sequential
itemValue INT NOT NULL UNIQUE,
PRIMARY KEY ( itemOrderId, itemValue )
)
GO
-- Scalar function to determine if list already exists
CREATE FUNCTION dbo.usf_listExists
(
@tvp AS dbo.tvp_myList READONLY
)
RETURNS BIT
AS
BEGIN
DECLARE @listExists_yn BIT
IF EXISTS (
SELECT COUNT(*) OVER() totalItems, itemOrderId, itemValue
FROM @tvp
EXCEPT
SELECT COUNT(*) OVER( PARTITION BY listId ) totalItems, itemOrderId, itemValue
For the list, create a single stored procedure which is the single point for inserting lists and listItems. This proc has logic to detect if a list already exists and serves as a gateway. This is straightforward to do using set operators like EXCEPT.
Here's a simple demo which shows how something like this might work. I'm using a table-valued parameter (TVP) to handle passing around lists and a scalar function to determine if a list already exists. Scalar functions are evil when used in resultsets (eg SELECT, WHERE clauses, JOIN ) but are ok when used for what they are meant for, returning single values. Spend some time working through it and see if it could work for you:
```
USE tempdb
GO
SET NOCOUNT ON
GO
------------------------------------------------------------------------------------------------
-- Setup START
------------------------------------------------------------------------------------------------
-- Reset
IF OBJECT_ID('dbo.entityLists') IS NOT NULL DROP TABLE dbo.entityLists
IF OBJECT_ID('dbo.listItems') IS NOT NULL DROP TABLE dbo.listItems
IF OBJECT_ID('dbo.entities') IS NOT NULL DROP TABLE dbo.entities
IF OBJECT_ID('dbo.lists') IS NOT NULL DROP TABLE dbo.lists
IF OBJECT_ID('dbo.usp_addList') IS NOT NULL DROP PROC dbo.usp_addList
IF OBJECT_ID('dbo.usf_listExists') IS NOT NULL DROP FUNCTION dbo.usf_listExists
IF EXISTS ( SELECT * FROM sys.types st INNER JOIN sys.schemas ss ON st.schema_id = ss.schema_id WHERE st.name = N'tvp_myList' AND ss.name = N'dbo')
DROP TYPE dbo.tvp_myList
GO
CREATE TABLE dbo.entities
(
entityId INT IDENTITY,
entityName VARCHAR(50) NOT NULL,
dateAdded DATETIME NOT NULL DEFAULT GETDATE(),
addedBy VARCHAR(30) NOT NULL DEFAULT SUSER_NAME(),
dateModified DATETIME NULL,
modifiedBy VARCHAR(30) NULL,
CONSTRAINT PK_entities PRIMARY KEY ( entityId ),
CONSTRAINT UK_entities__entityName UNIQUE ( entityName )
)
GO
CREATE TABLE dbo.lists
(
listId INT IDENTITY(1000,1),
listName VARCHAR(50) NOT NULL,
dateAdded DATETIME NOT NULL DEFAULT GETDATE(),
addedBy VARCHAR(30) NOT NULL DEFAULT SUSER_NAME(),
dateModified DATETIME NULL,
modifiedBy VARCHAR(30) NULL,
CONSTRAINT PK_lists PRIMARY KEY ( listId ),
CONSTRAINT UK_lists__listName UNIQUE ( listName )
)
GO
-- Although constraints might appear a little excessive, they prevent 'bad data' getting into the table,
-- whereas lesser constraints would not, eg prevent list with duplicate order
CREATE TABLE dbo.listItems
(
listItemId INT IDENTITY(100000,1),
listId INT NOT NULL,
itemOrderId INT NOT NULL,
itemValue INT NOT NULL,
CONSTRAINT PK_listItems PRIMARY KEY ( listItemId ), -- prevent list with duplicate order, value
CONSTRAINT UK_listItems__listId__itemValue UNIQUE ( listId, itemValue ), -- prevent list with duplicate value
CONSTRAINT UK_listItems__listId__itemOrderId UNIQUE ( listId, itemOrderId ), -- prevent list with duplicate order
CONSTRAINT UK_listItems UNIQUE ( listId, itemOrderId, itemValue ),
CONSTRAINT FK_listItems__listId FOREIGN KEY ( listId ) REFERENCES dbo.lists ( listId )
)
GO
CREATE INDEX _idx_listItems__itemOrderId__itemValue ON dbo.listItems ( itemOrderId, itemValue ) -- to be used in list existence check
GO
CREATE TABLE dbo.entityLists
(
entityId INT NOT NULL,
listId INT NOT NULL,
CONSTRAINT PK_entityLists PRIMARY KEY ( entityId, listId ),
CONSTRAINT UK_entityLists__entityId UNIQUE ( entityId ),
CONSTRAINT UK_entityLists__listId UNIQUE ( listId ),
CONSTRAINT FK_entityLists__entityId FOREIGN KEY ( entityId ) REFERENCES dbo.entities ( entityId ),
CONSTRAINT FK_entityLists__listId FOREIGN KEY ( listId ) REFERENCES dbo.lists ( listId )
)
GO
-- Create a type for passing lists around; TVP types and constraints mirror target table to act as a quality control
CREATE TYPE dbo.tvp_myList AS TABLE
(
itemOrderId INT NOT NULL UNIQUE, -- you could make this an identity column if you want to make sure order is sequential
itemValue INT NOT NULL UNIQUE,
PRIMARY KEY ( itemOrderId, itemValue )
)
GO
-- Scalar function to determine if list already exists
CREATE FUNCTION dbo.usf_listExists
(
@tvp AS dbo.tvp_myList READONLY
)
RETURNS BIT
AS
BEGIN
DECLARE @listExists_yn BIT
IF EXISTS (
SELECT COUNT(*) OVER() totalItems, itemOrderId, itemValue
FROM @tvp
EXCEPT
SELECT COUNT(*) OVER( PARTITION BY listId ) totalItems, itemOrderId, itemValue
Code Snippets
USE tempdb
GO
SET NOCOUNT ON
GO
------------------------------------------------------------------------------------------------
-- Setup START
------------------------------------------------------------------------------------------------
-- Reset
IF OBJECT_ID('dbo.entityLists') IS NOT NULL DROP TABLE dbo.entityLists
IF OBJECT_ID('dbo.listItems') IS NOT NULL DROP TABLE dbo.listItems
IF OBJECT_ID('dbo.entities') IS NOT NULL DROP TABLE dbo.entities
IF OBJECT_ID('dbo.lists') IS NOT NULL DROP TABLE dbo.lists
IF OBJECT_ID('dbo.usp_addList') IS NOT NULL DROP PROC dbo.usp_addList
IF OBJECT_ID('dbo.usf_listExists') IS NOT NULL DROP FUNCTION dbo.usf_listExists
IF EXISTS ( SELECT * FROM sys.types st INNER JOIN sys.schemas ss ON st.schema_id = ss.schema_id WHERE st.name = N'tvp_myList' AND ss.name = N'dbo')
DROP TYPE dbo.tvp_myList
GO
CREATE TABLE dbo.entities
(
entityId INT IDENTITY,
entityName VARCHAR(50) NOT NULL,
dateAdded DATETIME NOT NULL DEFAULT GETDATE(),
addedBy VARCHAR(30) NOT NULL DEFAULT SUSER_NAME(),
dateModified DATETIME NULL,
modifiedBy VARCHAR(30) NULL,
CONSTRAINT PK_entities PRIMARY KEY ( entityId ),
CONSTRAINT UK_entities__entityName UNIQUE ( entityName )
)
GO
CREATE TABLE dbo.lists
(
listId INT IDENTITY(1000,1),
listName VARCHAR(50) NOT NULL,
dateAdded DATETIME NOT NULL DEFAULT GETDATE(),
addedBy VARCHAR(30) NOT NULL DEFAULT SUSER_NAME(),
dateModified DATETIME NULL,
modifiedBy VARCHAR(30) NULL,
CONSTRAINT PK_lists PRIMARY KEY ( listId ),
CONSTRAINT UK_lists__listName UNIQUE ( listName )
)
GO
-- Although constraints might appear a little excessive, they prevent 'bad data' getting into the table,
-- whereas lesser constraints would not, eg prevent list with duplicate order
CREATE TABLE dbo.listItems
(
listItemId INT IDENTITY(100000,1),
listId INT NOT NULL,
itemOrderId INT NOT NULL,
itemValue INT NOT NULL,
CONSTRAINT PK_listItems PRIMARY KEY ( listItemId ), -- prevent list with duplicate order, value
CONSTRAINT UK_listItems__listId__itemValue UNIQUE ( listId, itemValue ), -- prevent list with duplicate value
CONSTRAINT UK_listItems__listId__itemOrderId UNIQUE ( listId, itemOrderId ), -- prevent list with duplicate order
CONSTRAINT UK_listItems UNIQUE ( listId, itemOrderId, itemValue ),
CONSTRAINT FK_listItems__listId FOREIGN KEY ( listId ) REFERENCES dbo.lists ( listId )
)
GO
CREATE INDEX _idx_listItems__itemOrderId__itemValue ON dbo.listItems ( itemOrderId, itemValue ) -- to be used in list existence check
GO
CREATE TABLE dbo.entityLists
(
entityId INT NOT NULL,
listId INT NOT NULL,
CONSTRAINT PK_entityLists PRIMARY KEY ( entityId, listId ),
CONSTRAINT UK_entityLists__entityId UNIQUE ( entityId ),
CONSTRAINT UK_entityLists__listId UNIQUE ( listId ),
CONSTContext
StackExchange Database Administrators Q#82373, answer score: 2
Revisions (0)
No revisions yet.