2014年11月30日星期日

Exact Inference in Graphical Model

Inference in Graphical Model

When we talk about inference in the model, means we get a graphical model at hand and want to know some kind of probability.
Three different kind of inference problem:
1. Marginal Query: P(X)
2. Conditional Query: P(X|Y)
3. MAP Query: argmaxXP(X|Y=y)
For 1,2 we will use same algorithm, for 3 we need a different algorithm.
When we perform inference, we need to decide is exact inference or approximate inference. Variable elimination, belief propagation for exact inference, MCMC and variational inference for approximate inference. We will talk about them in detail.

Variable Elimination

Variable elimination works in the following way, we need to sum out variable B,C,D,E and only keeping A in distribution: P(A)=B,C,D,EP(A,B,C,D,E). Then we need to factorize the joint distribution into factor form (CPD in BN, factor in MN): =B,C,D,EP(C)P(D)P(B|CD)P(E|B)P(A|B,E). In order to unify BN and MN, we need turn conditional probability distribution into factor form as follow:
=B,C,D,EϕC(C)ϕD(D)ϕB,C,D(B,C,D)ϕB,E(B,E)ϕA,B,E(A,B,E). Look at this, we can transform this formula into B,C,D{ϕC(C)ϕD(D)EϕB,E(B,E)ϕA,B,E(A,B,E)}. Then we use τ1(A,B)=EϕB,E(B,E)ϕA,B,E(A,B,E), now the formula becomes =B,C,DϕC(C)ϕD(D)ϕB,C,D(B,C,D)τ1(A,B). Like last step, we change the formula as follow: =C,DϕC(C)ϕD(D)BϕB,C,D(B,C,D)τ1(A,B)=C,DϕC(C)ϕD(D)τ2(A,C,D). Skip the following step, because the tricks are the same.
One important question would be this: we get some observation variable E=e, how can we compute the marginal probability P(X,e). It’s simple to do this, when we perform sum over all the factors, we fix all the evidence variable.
Variable elimination can be used to solve marginal probability and conditional probability. But not MAP inference, MAP inference will be solved by another method: max-product. The formula would be argmxA,B,C,DP(A,B,C,D,e), when we factorize the joint probability distribution, we can push some argmax into the formula.
Variable elimination works by eliminating one variable at one step. When we want to eliminate one variable, we need to combine all the factors realted to variable into one new factor without target variable. From this step, we know the complexity of variable elimination: The largest intermediate factor during the elimination process.

Belief Propagation

Another algorithm for inference is belief propagation. Variable elimination is suitable for BN & MN and for any graph structure. But Belief propagation called exact inference only for some kind of graph structure, i.e. tree structure. When belief propagation works on graph with loop, it’s called loopy belief propagation and the result is not exact, depend on specific structure and other details. So we will first introduce how to run belief propagation on tree structure graph.
Belief propagation is also one kind of message passing algorithm for inference, sometimes also called sum-product algorithm.
If the graphical model are not exactly tree form, we may need to convert this graphical model into Junction Tree or Clique Tree or Join Tree, a form we can run inference easier.

Get Clique Tree

Since we need to run BP on a clique tree, we need to know how to get the clique tree from directed or un-directed graphical model. We get two different ways to perform this task.
1. Run variable elimination algorithm and generate the clique tree based on the intermediate clique generated.
2. Triangularize the graph and construct clique tree accordingly.

Clique Tree, Property and other stuffs

When we talk about clique tree, junction tree, join tree and factor graph, we should know that this is not a new type of graphical model: Directed and Undirected graphical model is. But all these are different representation of graphical model, it’s used for inference. But we might want to know why we want to use clique tree, what’s the property of it and what makes it a clique tree.
We know the potential function is the building block for probability density function in the graphical model. There are a related concept: Cluster Graph. Cluster graph is a graph whose vertex is Cluster, a Cluster is composed of multiple random variables and has following properties:
1. Family preservation
For each potential function ϕk(xk), all the variables in its scope belong to the same cluster. Which means the correspondence between cluster and potential function is one to one.
One important property make a cluster graph a clique tree: RIP, Running Intersection Property. RIP means if two cluster Ci and Cj has intersection C, then a unique path existed from Ci to Cj, existed and unique. If one cluster graph has this property, then it is also called Clique Tree.

Generate Clique Tree

In fact, we can construct cluster graph and make it has RIP, then we can use it for inference. But we can get it in a easier way.
1. Run Variable Elinination
each intermediate clique as a cluster.
2. Triangularization the graph
3. Use Bethe cluster graph (or Factor graph, i think this is another name)
In this cluster graph, each potential function and random variable is a cluster. If one random variable belong to a specific potential function, then an edge will connect two clusters.
####Run Belief Propagation
We need to define message when running belief propagation. Message is a probability density function over a set of random variables. When we get a clique tree, we can define message between different tree node.
We define the sepset (separator set) as the intersection between tow different tree node. Then a message is in fact a marginal probability distribution over the sepset.
Since the cluster is a set of random variables Xt and has potentials attached Pt. We define the initial cluster potential as follow: Φ(Xt)=Xtp(xt) and all the message initialized to be 1. Then if we want to pass a message from cluster i to j. we calculate the message as follow:

Mij(Sij)=XtSijΦ(Xt)N(i)jMki(Ski)

where N(i) represent the neighbors of cluster i.
From now, the belief propagation (sum-product) algorithm is clear:
1. Initialize the cluster potential and message.
2. calculate the message, which is just get all the information from neighbors and send own belief on the sepset.

Comments

Variable elimination and belief propagation is very important and fundamental methods for graphical model inference, but not very helpful. Because most of the graphical model are not tree form.

with StackEdit.