patternMinor
Optimal way for grouping events
Viewed 0 times
optimaleventsgroupingwayfor
Problem
I am creating an event notification system.
Each event has a user and a subject, such that, 'user did event to the subject'. Now while presenting these the events need to be grouped.
All the events in the same group must either refer to the same user or the same subject and each event must be in exactly one group. Subject to those constraints, a group can contain any number of events, including just one. I want to calculate the grouping that uses the least possible number of groups.
Note that forming one group for each user, or forming one group for each subject does not, in general, give the smallest possible number of groups. Also, the events cannot, in general, all be placed in the same group, as that usually gives a group containing multiple users and multiple subjects, which is forbidden.
If I start from the first event and start putting it in the group with most possible events (acting all greedy), I will end up with a sub-optimal solution.
I could check all possible groupings but that will blow up exponentially ($2^n$ possible groupings for just n events).
How can I go about optimally grouping these events such that most of the events get grouped one way or the other, minimising the total number of groups?
Sample data -
Each event has a user and a subject, such that, 'user did event to the subject'. Now while presenting these the events need to be grouped.
All the events in the same group must either refer to the same user or the same subject and each event must be in exactly one group. Subject to those constraints, a group can contain any number of events, including just one. I want to calculate the grouping that uses the least possible number of groups.
Note that forming one group for each user, or forming one group for each subject does not, in general, give the smallest possible number of groups. Also, the events cannot, in general, all be placed in the same group, as that usually gives a group containing multiple users and multiple subjects, which is forbidden.
If I start from the first event and start putting it in the group with most possible events (acting all greedy), I will end up with a sub-optimal solution.
I could check all possible groupings but that will blow up exponentially ($2^n$ possible groupings for just n events).
How can I go about optimally grouping these events such that most of the events get grouped one way or the other, minimising the total number of groups?
Sample data -
event1 user1 subject1
event2 user2 subject1
event3 user3 subject1
event4 user2 subject2
event5 user3 subject2
event6 user1 subject3
event7 user2 subject4
event8 user2 subject5
event9 user3 subject5
event10 user1 subject5
event11 user4 subject1
event12 user4 subject2
...Solution
Represent the events as a bipartite graphs: the vertices are the users and subjects and the edges are events (so there's an edge from user $x$ to subject $y$ if there's an event involving that user and subject). What you're looking for is essentially a minimal vertex cover. A vertex cover is a set $C$ of vertices such that every edge includes at least one vertex from $C$. A minimal vertex cover is one that includes the smallest possible number of vertices.
Example. If the events are "John likes pizza", "John likes candy" and "Jane likes candy", the graph has vertices John, Jane, pizza and candy and the edges are John–pizza, John–candy and Jane–candy. A minimal vertex cover would be John and candy (in this case, there are several other minimal vertex covers).
For general graphs, finding minimal vertex covers is NP-hard but, for bipartite graphs, there is a polynomial-time algorithm because of König's theorem, which says that finding minimal vertex covers of bipartite graphs is equivalent to finding maximum matchings (i.e., a biggest-possible set of edges such that no pair of edges in the set shares an endpoint), which can be done by the Hopcroft–Karp algorithm. The algorithm for finding a minimum vertex cover from a maximum matching is described in the "Proof" section of the Wikipedia König's theorem page.
Let $C$ be the minimum vertex cover that you just computed, and let the vertices in it be $c_1, \dots, c_k$ (so each $c_i$ is either a user or a subject). The groups are $g_1, \dots, g_k$, where $g_1$ is all the events that involve the user or subject $c_1$and, for $i>1$, $g_i$ is the set of all events that involve the user or subject $c_i$ that haven't already been included in one of $g_1, \dots, g_{i-1}$.
Example. If the events are "John likes pizza", "John likes candy" and "Jane likes candy", the graph has vertices John, Jane, pizza and candy and the edges are John–pizza, John–candy and Jane–candy. A minimal vertex cover would be John and candy (in this case, there are several other minimal vertex covers).
For general graphs, finding minimal vertex covers is NP-hard but, for bipartite graphs, there is a polynomial-time algorithm because of König's theorem, which says that finding minimal vertex covers of bipartite graphs is equivalent to finding maximum matchings (i.e., a biggest-possible set of edges such that no pair of edges in the set shares an endpoint), which can be done by the Hopcroft–Karp algorithm. The algorithm for finding a minimum vertex cover from a maximum matching is described in the "Proof" section of the Wikipedia König's theorem page.
Let $C$ be the minimum vertex cover that you just computed, and let the vertices in it be $c_1, \dots, c_k$ (so each $c_i$ is either a user or a subject). The groups are $g_1, \dots, g_k$, where $g_1$ is all the events that involve the user or subject $c_1$and, for $i>1$, $g_i$ is the set of all events that involve the user or subject $c_i$ that haven't already been included in one of $g_1, \dots, g_{i-1}$.
Context
StackExchange Computer Science Q#29379, answer score: 4
Revisions (0)
No revisions yet.