patternMinor
Priority Inbox Database Design
Viewed 0 times
prioritydesigndatabaseinbox
Problem
I was just wondering, what would be the table structure for a messaging system similar to Gmail's priority inbox?
If you're unfamiliar with the system, it prioritises messages based on the type of messages you most often read and those to which you reply, then separates them from 'normal' messages.
In this hypothetical scenario, I'm basing the 'score' only on senders to which the user replies.
I was thinking that each user would have a table assigned to them that recorded the user ids of the top 10 people they responded to, each user name requiring 5 replies to be entered into said table. For each reply, the 'replied' value would increment by one.
Is that the best/most logical way of doing it?
I'm not familiar with 'NoSQL' databases, so my logic is based on a relational methodology. My apologies if I'm missing something obvious because of that.
If you're unfamiliar with the system, it prioritises messages based on the type of messages you most often read and those to which you reply, then separates them from 'normal' messages.
In this hypothetical scenario, I'm basing the 'score' only on senders to which the user replies.
I was thinking that each user would have a table assigned to them that recorded the user ids of the top 10 people they responded to, each user name requiring 5 replies to be entered into said table. For each reply, the 'replied' value would increment by one.
Is that the best/most logical way of doing it?
I'm not familiar with 'NoSQL' databases, so my logic is based on a relational methodology. My apologies if I'm missing something obvious because of that.
Solution
I think the trick to this is that it doesn't have to be real time, just eventually consistent, in which case it's straightforward enough (using SQL Server, but this applies in any DB). First a trivial table and some sample data:
This is logging for every message, who sent it and who to (assuming for simplicity that the message body is stored in another table). So we can see that the top sender to Octavian is Gaius (3 messages of 5), and the top sender to Gaius is Aurelius (2 messages of 3). To query that using a CTE:
In practice you would have a job that ran every minute (or whatever interval is best) refreshing a lookup table mapping a user to their top sender (or top n senders using
For the sake of simplicity I have left off indexes and partitioning - they would be the key to performance of this solution. You could certainly scale this to many billions of messages on any modern DB/hardware. GMail most likely has a custom solution tho', but with 20,000 engineers Google can do that!
create table messages
(message_id integer, sender varchar(20), recipient varchar (20))
go
insert into messages values (1, 'Gaius', 'Octavian')
insert into messages values (2, 'Gaius', 'Octavian')
insert into messages values (3, 'Gaius', 'Octavian')
insert into messages values (4, 'Aurelius', 'Octavian')
insert into messages values (5, 'Aurelius', 'Octavian')
insert into messages values (6, 'Aurelius', 'Gaius')
insert into messages values (7, 'Aurelius', 'Gaius')
insert into messages values (8, 'Octavian', 'Gaius')
goThis is logging for every message, who sent it and who to (assuming for simplicity that the message body is stored in another table). So we can see that the top sender to Octavian is Gaius (3 messages of 5), and the top sender to Gaius is Aurelius (2 messages of 3). To query that using a CTE:
with q1 as (
select recipient, sender, count(sender) as num_messages_from_sender,
rank() over (partition by recipient order by count(sender) desc) as priority
from messages group by recipient, sender)
select recipient, sender as top_sender, num_messages_from_sender
from q1 where priority=1
goIn practice you would have a job that ran every minute (or whatever interval is best) refreshing a lookup table mapping a user to their top sender (or top n senders using
where priority <= n) (or in your case, you would be tracking the senders to which they reply with another column and filtering by that).For the sake of simplicity I have left off indexes and partitioning - they would be the key to performance of this solution. You could certainly scale this to many billions of messages on any modern DB/hardware. GMail most likely has a custom solution tho', but with 20,000 engineers Google can do that!
Code Snippets
create table messages
(message_id integer, sender varchar(20), recipient varchar (20))
go
insert into messages values (1, 'Gaius', 'Octavian')
insert into messages values (2, 'Gaius', 'Octavian')
insert into messages values (3, 'Gaius', 'Octavian')
insert into messages values (4, 'Aurelius', 'Octavian')
insert into messages values (5, 'Aurelius', 'Octavian')
insert into messages values (6, 'Aurelius', 'Gaius')
insert into messages values (7, 'Aurelius', 'Gaius')
insert into messages values (8, 'Octavian', 'Gaius')
gowith q1 as (
select recipient, sender, count(sender) as num_messages_from_sender,
rank() over (partition by recipient order by count(sender) desc) as priority
from messages group by recipient, sender)
select recipient, sender as top_sender, num_messages_from_sender
from q1 where priority=1
goContext
StackExchange Database Administrators Q#2093, answer score: 2
Revisions (0)
No revisions yet.