patternMinor
Determining Probability from a Graph
Viewed 0 times
fromdetermininggraphprobability
Problem
Lets say I have node A that connects to 10 other nodes. 6 of those nodes have Property 1 and the other 4 have Property 2. How can I easily determining the probability of landing on a node with property 1 randomly while traversing the graph?
Abstract Example
To clarify this problem, I'm trying to choose the most probable path.
I have a node $v$ with 10 edges going out, and 6 of the nodes on the other end have a certain property $A$, while 4 have a certain property $B$. Now, each of the 10 nodes also has 10 edges going out, and at the end there are 10 nodes, some with property $A$, some with property $B$. The key here is that each property denotes an occurrence of something. Using multiple nodes here is basically a replacement for weighted edges. Rather than having an edge with weight 6 leading to $A$, I have 6 occurrences of $A$. I know this sounds counterintuitive, but this is actually a smaller part of a much, much large problem.
I want to find the most likely sequence of something occurring. We can see that there is a 0.6 probability of $A$ occurring. How can I easily determine that?
Basicslly, the question comes down to how can I traverse a graph of probabilities made from a graph of occurrences without having to completely generate a new graph. The brute force method to this would be to start at $v$, count the number of occurrences of $A$ and the number of occurrences of $B$, determine the probability of each, and then on a new graph have an edge from $v$ to $A$ with a weight 0.6. Adter that, you would then go to each occurrence of A, make a list of the difference occurrences branching from that, find the probability of each (with the many occurrences of $A$ acting as one node), then on the new graph, add an edge to each occurrence.
This may seem like a lot of work, but with the algorithm that I am designing, each node may have a dozen properties. I want to be able to quickly traverse the graph taking the most likely path through certain properties, which
Abstract Example
To clarify this problem, I'm trying to choose the most probable path.
I have a node $v$ with 10 edges going out, and 6 of the nodes on the other end have a certain property $A$, while 4 have a certain property $B$. Now, each of the 10 nodes also has 10 edges going out, and at the end there are 10 nodes, some with property $A$, some with property $B$. The key here is that each property denotes an occurrence of something. Using multiple nodes here is basically a replacement for weighted edges. Rather than having an edge with weight 6 leading to $A$, I have 6 occurrences of $A$. I know this sounds counterintuitive, but this is actually a smaller part of a much, much large problem.
I want to find the most likely sequence of something occurring. We can see that there is a 0.6 probability of $A$ occurring. How can I easily determine that?
Basicslly, the question comes down to how can I traverse a graph of probabilities made from a graph of occurrences without having to completely generate a new graph. The brute force method to this would be to start at $v$, count the number of occurrences of $A$ and the number of occurrences of $B$, determine the probability of each, and then on a new graph have an edge from $v$ to $A$ with a weight 0.6. Adter that, you would then go to each occurrence of A, make a list of the difference occurrences branching from that, find the probability of each (with the many occurrences of $A$ acting as one node), then on the new graph, add an edge to each occurrence.
This may seem like a lot of work, but with the algorithm that I am designing, each node may have a dozen properties. I want to be able to quickly traverse the graph taking the most likely path through certain properties, which
Solution
Let me first rephrase your problem as I understand it. You have a tree of random events. Every event consists of choosing one of a set of child events; every child has the same probability but may have different child events of their own. The children are grouped by some classifier. You want to have efficient access to the group reached by the most likely path of groups, not individual children.
I suggest you stick to your idea of the compressed tree of groups but keep it around and maintain it. In a sense, you use it as overlay over the actual graph:
[source]
The dotted lines represent the original tree, the solid green ones the overlay tree, weighted by the number of original child nodes of the respective group. Note that you can have access to both if you keep references to both roots. You find the desired group ("property" in your notation) by greedily following the edge with maximum weight among all outgoing edges, starting in $I$.
Maintaining this overlay tree is easy: upon adding a node on some level, add it to the appropriate group, adjust the weights and proceed recursively. In pseudo code, this is what the overlay tree might look like:
Note that the whole thing does not yield correct probabilities, anyway; you count nodes on every level, assuming they occur with the same probability, but that is not correct as siblings can have different amount of children. In the example, the probability to get $C$ if you had $A$ in the first round is not $0.5$ as your method determines -- two $C$ are child of them same $A$ and thus compete over the same slot! The precise formula is
$\qquad \displaystyle \begin{align}
\operatorname{Pr}(C \mid A) &= \frac{\sum_{u \text{ type } A} \sum_{v \text{ of type }C \text{ under } u} \operatorname{Pr}(u)\operatorname{Pr}(v \mid u)}{\operatorname{Pr}(A)} \\
&= \frac{\frac{1}{5}\cdot\frac{1}{3} + \frac{1}{5}\cdot\frac{1}{3} + \frac{1}{5}\cdot\frac{1}{2}}{\frac{3}{5}} \\
&= \frac{7}{18}
\end{align}$
This is more complicated to compute than raw counting, but you can still use the idea of the overlay tree. Maintenance will be more expensive but that is something you can not get around without further assumptions on the trees. Note that using the precise formula enables you to switch to non-uniform child distributions without too much hassle.
I suggest you stick to your idea of the compressed tree of groups but keep it around and maintain it. In a sense, you use it as overlay over the actual graph:
[source]
The dotted lines represent the original tree, the solid green ones the overlay tree, weighted by the number of original child nodes of the respective group. Note that you can have access to both if you keep references to both roots. You find the desired group ("property" in your notation) by greedily following the edge with maximum weight among all outgoing edges, starting in $I$.
Maintaining this overlay tree is easy: upon adding a node on some level, add it to the appropriate group, adjust the weights and proceed recursively. In pseudo code, this is what the overlay tree might look like:
class Node {
val label : String
val children : List[Node]
}
class OverlayNode {
val label : String
val internal : List[Node]
val children : Map[String,OverlayNode]
def weight = children.size
def insert(n : Node) {
assert n.label == this.label
internal ++ n
foreach ( c in n.children ) {
if ( children hasKey c.label )
children(c.label).insert(c)
else
children ++ (new OverlayNode(c.label, c))
}
}
def find_max {
if ( children.size == 0 )
return this
else
return (children argmax {c => c.weight}).find_max
}
}Note that the whole thing does not yield correct probabilities, anyway; you count nodes on every level, assuming they occur with the same probability, but that is not correct as siblings can have different amount of children. In the example, the probability to get $C$ if you had $A$ in the first round is not $0.5$ as your method determines -- two $C$ are child of them same $A$ and thus compete over the same slot! The precise formula is
$\qquad \displaystyle \begin{align}
\operatorname{Pr}(C \mid A) &= \frac{\sum_{u \text{ type } A} \sum_{v \text{ of type }C \text{ under } u} \operatorname{Pr}(u)\operatorname{Pr}(v \mid u)}{\operatorname{Pr}(A)} \\
&= \frac{\frac{1}{5}\cdot\frac{1}{3} + \frac{1}{5}\cdot\frac{1}{3} + \frac{1}{5}\cdot\frac{1}{2}}{\frac{3}{5}} \\
&= \frac{7}{18}
\end{align}$
This is more complicated to compute than raw counting, but you can still use the idea of the overlay tree. Maintenance will be more expensive but that is something you can not get around without further assumptions on the trees. Note that using the precise formula enables you to switch to non-uniform child distributions without too much hassle.
Code Snippets
class Node {
val label : String
val children : List[Node]
}
class OverlayNode {
val label : String
val internal : List[Node]
val children : Map[String,OverlayNode]
def weight = children.size
def insert(n : Node) {
assert n.label == this.label
internal ++ n
foreach ( c in n.children ) {
if ( children hasKey c.label )
children(c.label).insert(c)
else
children ++ (new OverlayNode(c.label, c))
}
}
def find_max {
if ( children.size == 0 )
return this
else
return (children argmax {c => c.weight}).find_max
}
}Context
StackExchange Computer Science Q#2875, answer score: 5
Revisions (0)
No revisions yet.