snippetsqlMinor
How to store hierarchical dimension for timeseries data
Viewed 0 times
timeseriesdimensionstoreforhowhierarchicaldata
Problem
I have a requirement to store data about a load of Searches. Each time someone searches we want to record it for analyzing later on.
As part of the search the user can select one or more Categories to search within. We would like to record the categories selected and the hierarchy that the category was in.
However, the categories can change and move over time. We are likely to want to analyze both by the parent categories that the child categories were in at the time of the search, and also by the parents that the categories are in at the time of analysis.
Currently I have a stream of JSON API calls (we need to support 50 a second) which look like this:
Categories D and E were selected, Category B is the parent of D & E and Category A (one of several roots) is the parent of B.
We are currently just storing the time stamp and search string in a flat table in a normal OLTP style SQL Server DB:
As part of the search the user can select one or more Categories to search within. We would like to record the categories selected and the hierarchy that the category was in.
However, the categories can change and move over time. We are likely to want to analyze both by the parent categories that the child categories were in at the time of the search, and also by the parents that the categories are in at the time of analysis.
- What is the best way of storing this kind of data?
Currently I have a stream of JSON API calls (we need to support 50 a second) which look like this:
{
"timestamp":"2018-03-08T11:22:33.456+0000",
"searchString":"This is a search",
"categories":[
"Cat D/Cat B/Cat A",
"Cat E/Cat B/Cat A"
]
}Categories D and E were selected, Category B is the parent of D & E and Category A (one of several roots) is the parent of B.
We are currently just storing the time stamp and search string in a flat table in a normal OLTP style SQL Server DB:
CREATE TABLE SearchEvent (
id int PRIMARY KEY,
searchString nvarchar(100),
timestamp datetimeoffset
)Solution
SQL Server has a
Consider this example:
A sample query against the table:
The results:
╔═══════╦═══════════╦════════╦══════════════╦══════════════════╗
║ CatID ║ Hierarchy ║ Parent ║ CategoryName ║ ParentCategories ║
╠═══════╬═══════════╬════════╬══════════════╬══════════════════╣
║ 1 ║ / ║ NULL ║ Root ║ NULL ║
║ 2 ║ /1/ ║ / ║ A ║ NULL ║
║ 3 ║ /1/2/ ║ /1/ ║ B ║ A ║
║ 4 ║ /1/2/3/ ║ /1/2/ ║ C ║ A, B ║
║ 5 ║ /1/2/4/ ║ /1/2/ ║ D ║ A, B ║
║ 6 ║ /1/2/5/ ║ /1/2/ ║ E ║ A, B ║
╚═══════╩═══════════╩════════╩══════════════╩══════════════════╝
To implement this into your system, you'd reference the above table in your table of searches. Something like:
To insert a search in
Query:
Results:
╔══════════════════╦══════════╦═════════════════╦════════════════════════════════════╗
║ searchString ║ Category ║ CategoryParents ║ timestamp ║
╠══════════════════╬══════════╬═════════════════╬════════════════════════════════════╣
║ This is a search ║ C ║ /A/B ║ 2018-03-08 15:29:25.2267243 -06:00 ║
╚══════════════════╩══════════╩═════════════════╩════════════════════════════════════╝
If you subsequently want to reparent
Then, when you insert into the newly re-parented category, like this:
You get:
Results:
╔═══════════════════════════════════════════════╦══════════╦═════════════════╦════════════════════════════════════╗
║ searchString ║ Category ║ CategoryParents ║ timestamp ║
╠═══════════════════════════════════════════════╬══════════╬═════════════════╬════════════════════════════════════╣
║ This is a search ║ C ║ /A/B/C ║ 2018-03-08 15:29:25.2267243 -06:00 ║
║ This is a search, but with a different parent ║ C ║ /A ║ 2018-03-08 15:29:25.2546739 -06:00 ║
╚═══════════════════════════════════════════════╩══════════╩═════════════════╩════════════════════════════════════╝
hierarchyid type defined to deal with this kind of system.Consider this example:
IF OBJECT_ID(N'dbo.Categories', N'U') IS NOT NULL
DROP TABLE dbo.Categories;
GO
CREATE TABLE dbo.Categories
(
CatID int NOT NULL
CONSTRAINT PK_Categories
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, hid hierarchyid NOT NULL
, CategoryName varchar(30) NOT NULL
);
INSERT INTO dbo.Categories (hid, CategoryName)
VALUES ('/', 'Root')
, ('/1/', 'A')
, ('/1/2/', 'B')
, ('/1/2/3/', 'C')
, ('/1/2/4/', 'D')
, ('/1/2/5/', 'E');A sample query against the table:
SELECT c.CatID
, Hierarchy = c.hid.ToString()
, Parent = c.hid.GetAncestor(1).ToString()
, c.CategoryName
, ParentCategories = STUFF((
SELECT ', ' + c1.CategoryName
FROM dbo.Categories c1
WHERE c.hid.IsDescendantOf(c1.hid) = CONVERT(bit, 1)
AND c1.hid <> c.hid
AND c1.CategoryName <> 'Root'
FOR XML PATH ('')), 1, 2, '')
FROM dbo.Categories c;The results:
╔═══════╦═══════════╦════════╦══════════════╦══════════════════╗
║ CatID ║ Hierarchy ║ Parent ║ CategoryName ║ ParentCategories ║
╠═══════╬═══════════╬════════╬══════════════╬══════════════════╣
║ 1 ║ / ║ NULL ║ Root ║ NULL ║
║ 2 ║ /1/ ║ / ║ A ║ NULL ║
║ 3 ║ /1/2/ ║ /1/ ║ B ║ A ║
║ 4 ║ /1/2/3/ ║ /1/2/ ║ C ║ A, B ║
║ 5 ║ /1/2/4/ ║ /1/2/ ║ D ║ A, B ║
║ 6 ║ /1/2/5/ ║ /1/2/ ║ E ║ A, B ║
╚═══════╩═══════════╩════════╩══════════════╩══════════════════╝
To implement this into your system, you'd reference the above table in your table of searches. Something like:
CREATE TABLE dbo.Searches (
id int NOT NULL
CONSTRAINT PK_Searches
PRIMARY KEY
IDENTITY(1,1)
, searchString nvarchar(100)
, timestamp datetimeoffset
CONSTRAINT DF_Sarches_timestamp
DEFAULT (SYSDATETIMEOFFSET())
, CatID int NOT NULL
CONSTRAINT FK_Searches_Category
FOREIGN KEY
REFERENCES dbo.Categories (CatID)
);To insert a search in
Category C:INSERT INTO dbo.Searches (searchString, CatID)
VALUES ('This is a search', 4);Query:
SELECT s.searchString
, s.timestamp
, Category = c.CategoryName
, CategoryParents = (
SELECT '/' + c1.CategoryName
FROM dbo.Categories c1
WHERE c.hid.IsDescendantOf(c1.hid) = CONVERT(bit, 1)
AND c1.CategoryName <> 'Root'
AND c1.hid <> c.hid
FOR XML PATH (''))
FROM dbo.Searches s
INNER JOIN dbo.Categories c ON s.CatID = c.CatID;Results:
╔══════════════════╦══════════╦═════════════════╦════════════════════════════════════╗
║ searchString ║ Category ║ CategoryParents ║ timestamp ║
╠══════════════════╬══════════╬═════════════════╬════════════════════════════════════╣
║ This is a search ║ C ║ /A/B ║ 2018-03-08 15:29:25.2267243 -06:00 ║
╚══════════════════╩══════════╩═════════════════╩════════════════════════════════════╝
If you subsequently want to reparent
C, and retain the old version of C, you do this:INSERT INTO dbo.Categories (hid, CategoryName)
VALUES ('/1/2/', 'C');Then, when you insert into the newly re-parented category, like this:
INSERT INTO dbo.Searches (searchString, CatID)
VALUES ('This is a search, but with a different parent', 7);You get:
SELECT s.searchString
, s.timestamp
, Category = c.CategoryName
, CategoryParents = (
SELECT '/' + c1.CategoryName
FROM dbo.Categories c1
WHERE c.hid.IsDescendantOf(c1.hid) = CONVERT(bit, 1)
AND c1.CategoryName <> 'Root'
AND c1.hid <> c.hid
FOR XML PATH (''))
FROM dbo.Searches s
INNER JOIN dbo.Categories c ON s.CatID = c.CatID;Results:
╔═══════════════════════════════════════════════╦══════════╦═════════════════╦════════════════════════════════════╗
║ searchString ║ Category ║ CategoryParents ║ timestamp ║
╠═══════════════════════════════════════════════╬══════════╬═════════════════╬════════════════════════════════════╣
║ This is a search ║ C ║ /A/B/C ║ 2018-03-08 15:29:25.2267243 -06:00 ║
║ This is a search, but with a different parent ║ C ║ /A ║ 2018-03-08 15:29:25.2546739 -06:00 ║
╚═══════════════════════════════════════════════╩══════════╩═════════════════╩════════════════════════════════════╝
Code Snippets
IF OBJECT_ID(N'dbo.Categories', N'U') IS NOT NULL
DROP TABLE dbo.Categories;
GO
CREATE TABLE dbo.Categories
(
CatID int NOT NULL
CONSTRAINT PK_Categories
PRIMARY KEY CLUSTERED
IDENTITY(1,1)
, hid hierarchyid NOT NULL
, CategoryName varchar(30) NOT NULL
);
INSERT INTO dbo.Categories (hid, CategoryName)
VALUES ('/', 'Root')
, ('/1/', 'A')
, ('/1/2/', 'B')
, ('/1/2/3/', 'C')
, ('/1/2/4/', 'D')
, ('/1/2/5/', 'E');SELECT c.CatID
, Hierarchy = c.hid.ToString()
, Parent = c.hid.GetAncestor(1).ToString()
, c.CategoryName
, ParentCategories = STUFF((
SELECT ', ' + c1.CategoryName
FROM dbo.Categories c1
WHERE c.hid.IsDescendantOf(c1.hid) = CONVERT(bit, 1)
AND c1.hid <> c.hid
AND c1.CategoryName <> 'Root'
FOR XML PATH ('')), 1, 2, '')
FROM dbo.Categories c;CREATE TABLE dbo.Searches (
id int NOT NULL
CONSTRAINT PK_Searches
PRIMARY KEY
IDENTITY(1,1)
, searchString nvarchar(100)
, timestamp datetimeoffset
CONSTRAINT DF_Sarches_timestamp
DEFAULT (SYSDATETIMEOFFSET())
, CatID int NOT NULL
CONSTRAINT FK_Searches_Category
FOREIGN KEY
REFERENCES dbo.Categories (CatID)
);INSERT INTO dbo.Searches (searchString, CatID)
VALUES ('This is a search', 4);SELECT s.searchString
, s.timestamp
, Category = c.CategoryName
, CategoryParents = (
SELECT '/' + c1.CategoryName
FROM dbo.Categories c1
WHERE c.hid.IsDescendantOf(c1.hid) = CONVERT(bit, 1)
AND c1.CategoryName <> 'Root'
AND c1.hid <> c.hid
FOR XML PATH (''))
FROM dbo.Searches s
INNER JOIN dbo.Categories c ON s.CatID = c.CatID;Context
StackExchange Database Administrators Q#199703, answer score: 2
Revisions (0)
No revisions yet.