patternMinor
Storing multiple tags on analytics database
Viewed 0 times
tagsdatabasemultiplestoringanalytics
Problem
I would like to store user purchase custom tags on each transaction, example if user bought shoes then tags are
These tags are that seller interested in querying back to understand the sales.
My idea is when ever new tag comes in create new code(something like hashcode but sequential) for that tag, and code starts from
Let us assume mapping is (at application level)
So storing the above purchase transaction, tag will be like
any solution to fix this?
Update_1:
I have not followed
I got suggestion like below
EXISTS (SELECT 1 FROM transaction_tag WHERE tag_id = 'zz' and trans_id
= tr.trans_id) in the WHERE clause once for each tag (note: assumes tr is an alias to the transaction table in the surrounding query)
I have not followed this; since i have to perform AND and OR condition on the tags, example ("SPORTS" AND "ADIDAS") ---- "SHOE" AND ("NIKE" OR "ADIDAS")
Update_2:
I have not followed bitfield, since dont know redshi
"SPORTS", "NIKE", SHOES, COLOUR_BLACK, SIZE_12,..These tags are that seller interested in querying back to understand the sales.
My idea is when ever new tag comes in create new code(something like hashcode but sequential) for that tag, and code starts from
"a-z" 26 letters then "aa, ab, ac...zz" goes on. Now keep all the tags given for in one transaction in the one column called tag (varchar) by separating with "|".Let us assume mapping is (at application level)
"SPORTS" = a
"TENNIS" = b
"CRICKET" = c
...
...
"NIKE" = z //Brands company
"ADIDAS" = aa
"WOODLAND" = ab
...
...
SHOES = ay
...
...
COLOUR_BLACK = bc
COLOUR_RED = bd
COLOUR_BLUE = be
...
SIZE_12 = cq
...So storing the above purchase transaction, tag will be like
tag="|a|z|ay|bc|cq|" And now allowing seller to search number of SHOES sold by adding WHERE condition tag LIKE %|ay|%. Now the problem is i cannot use index (sort key in redshift db) for "LIKE starts with %". So how to solve this issue, since i might have 100 millions of records? dont want full table scan.. any solution to fix this?
Update_1:
I have not followed
bridge table concept (cross-reference table) since I want to perform group by on the results after searching the specified tags. My solution will give only one row when two tags matched in a single transaction, but bridge table will give me two rows? then my sum() will be doubled. I got suggestion like below
EXISTS (SELECT 1 FROM transaction_tag WHERE tag_id = 'zz' and trans_id
= tr.trans_id) in the WHERE clause once for each tag (note: assumes tr is an alias to the transaction table in the surrounding query)
I have not followed this; since i have to perform AND and OR condition on the tags, example ("SPORTS" AND "ADIDAS") ---- "SHOE" AND ("NIKE" OR "ADIDAS")
Update_2:
I have not followed bitfield, since dont know redshi
Solution
I'm still convinced that using a many-to-many lookup table (a bridge table) is still your best option here. Your concern about matching multiple rows can be fixed by proper query design. Let's say your tables are:
So each purchase can have multiple tags set (no limit), and just for fun, I added the ability to categorized the tags by TagType, maybe that contains things like "ProductType", "Brand", "Color", "Sport", so you have a way to tell that "shoes" is a "ProductType" tag, "Nike" is a brand tag, and "soccer" is a sport tag.
Then if you want to query (and return just single rows), just do:
If you need to do fancier combo-searches (find purchases of Nike shoes or Adidas shoes, your query will have to also be fancier:
Again, that's still returning a single row for each purchase that matches your desired tag combination.
CREATE TABLE purchases(PurchaseID,CustomerID,PurchaseDate,...)
CREATE TABLE tags(TagID,TagType,TagName)
CREATE TABLE purchasetags(PurchaseID,TagID)So each purchase can have multiple tags set (no limit), and just for fun, I added the ability to categorized the tags by TagType, maybe that contains things like "ProductType", "Brand", "Color", "Sport", so you have a way to tell that "shoes" is a "ProductType" tag, "Nike" is a brand tag, and "soccer" is a sport tag.
Then if you want to query (and return just single rows), just do:
SELECT *
FROM purchases
WHERE PurchaseID IN (SELECT pt.PurchaseID
FROM purchasetag pt
INNER JOIN tags t ON pt.TagID=t.TagID
WHERE t.TagName IN ('Adidas','Nike'))
GROUP BY whatever...If you need to do fancier combo-searches (find purchases of Nike shoes or Adidas shoes, your query will have to also be fancier:
SELECT *
FROM purchases
WHERE PurchaseID IN (SELECT pt.PurchaseID
FROM purchasetag pt
INNER JOIN tags t ON pt.TagID=t.TagID
WHERE t.TagName = 'Shoes')
AND PurchaseID IN (SELECT pt.PurchaseID
FROM purchasetag pt
INNER JOIN tags t ON pt.TagID=t.TagID
WHERE t.TagName IN ('Adidas','Nike'))Again, that's still returning a single row for each purchase that matches your desired tag combination.
Code Snippets
CREATE TABLE purchases(PurchaseID,CustomerID,PurchaseDate,...)
CREATE TABLE tags(TagID,TagType,TagName)
CREATE TABLE purchasetags(PurchaseID,TagID)SELECT *
FROM purchases
WHERE PurchaseID IN (SELECT pt.PurchaseID
FROM purchasetag pt
INNER JOIN tags t ON pt.TagID=t.TagID
WHERE t.TagName IN ('Adidas','Nike'))
GROUP BY whatever...SELECT *
FROM purchases
WHERE PurchaseID IN (SELECT pt.PurchaseID
FROM purchasetag pt
INNER JOIN tags t ON pt.TagID=t.TagID
WHERE t.TagName = 'Shoes')
AND PurchaseID IN (SELECT pt.PurchaseID
FROM purchasetag pt
INNER JOIN tags t ON pt.TagID=t.TagID
WHERE t.TagName IN ('Adidas','Nike'))Context
StackExchange Database Administrators Q#176231, answer score: 5
Revisions (0)
No revisions yet.