2014年9月22日星期一

Primal & Dual Decomposition Methods

Decomposition Methods

Some interesting structure may exist in objective function and constraints function, decomposition is an strategy to utilize the structure. In the simplest situation, when solve an unconstrained optimization problem and the objective function f0(x) can be written as

f0(x)=i=1nfi(xi)

So in this case, the variable are divided into many subvectors x=(x1,...,xn). We can optimize each sub-vector independently without constraint of other variables. Decomposition method will split a big problem into many smaller size problem And the solve all these problem simultaneously or sequentially. We will talk decomposition method from an easy to hard order.

Unconstrained optimization problem

With Only Complicating Variable
In this case, we want to solve

f0(x)=f1(x1,y)+f2(x2,y)(1)

Here vector x=(x1,x2,y) three different components. If y is fixed, then we can solve argminx1f1(x1,y) and argminx2f2(x2,y) independently, so y is called complicating variable. And x1 and x2 are called private variable, because they belong to different sub-system. y can be called public variable, interface variable or boundary variable, just because it connect two sub-problem.

Primal decomposition

Inspired by the face that, if y is fixed then we can solve two independent problem. We define two sub-problem as follow

Φ1(y)=argminx1f1(x1,y);Φ2(y)=argminx2f2(x2,y)

and define a master problem which equivalent to problem (1).
Φ(y)=Φ1(y)+Φ2(y)

So primal decomposition works in the following order
1. start with initial guess of y=y0
2. solve Φ1 and Φ2
3. solve Φ and get new value of y
This method called primal decomposition because we manipulate the primal variable directly. Solve the sub-problem could use any method, so the master problem.

Dual decomposition

Instead we solve the original problem directly, we transform the original problem (1) into follow

argminxf0(x)=f1(x1,y1)+f2(x2,y2)

s.t.y1=y2

In this form, we create two copies of boundary variable y1 and y2. And add a consensus constraint that y1=y2. After this transformation, we found the problem is separable now, just with a new constraints.
Lagrangian problem becomes
L(x1,x2,y1,y2,λ)=f1(x1,y1)+f2(x2,y2)+λT(y1y2) then transform to
L(x1,x2,y1,y2,λ)=f1(x1,y1)+λTy1+f2(x2,y2)+λTy2, so the dual function becomes
g(λ)=g1(λ1)+g2(λ2)

g1(λ)=infx1,y2f1(x1,y1)+λTy1
g2(λ)=infx2,y2f2(x2,y2)λTy2
Now the sub-problem g1(λ) and g2(λ) can be solved independently and then solve the master problem.

Sub-gradient for master problem

If we wan to solve the master problem with sub-gradient method or cutting-plane method, we need to calculate the sub-gradient of master problem with respect to λ. Calculate the sub-gradient for g1(λ) and g2(λ) is easy. If x1¯ and x2¯ optimize g1(λ) over x1 and y1, then the subgradient is y1. For g2(λ), the corresponding sub-gradient is y2.

Convergence Estimation

Under some circumstance, we will perform some convergence test. Usual convergence criterion include
1. Gradient of the norm less than a predefined ϵ
2. norm of difference of function value between iterations less than ϵ
3. norm of difference of optimization variable between iterations less than ϵ
4. norm of difference of current value and optimal value less than ϵ
When using Dual decomposition method, we can estimate the difference of current function value fcurrent and optimal value f. Work as follow:
After iteration k, we get optimal value for sub-problem: x1¯,y1¯,x2¯,y2¯, then we get the lower bound as follow:

ff1(x1¯,y1¯)+f2(x2¯,y2¯)+λT(y1¯y2¯)

when the optimization not convergence the y1y2, we can construct y^=y1¯+y2¯/2, then
f<f1(x1¯,y1¯)+f2(x2¯,y2¯)

when we get upper fupper and lower bound flower , we can estimate the difference fupperflower, if less than ϵ then we can terminate the optimization process.
Another better way to estimate the upper bound is to optimize the primal problem given y^.

Constrained Optimization

With only complicating variable
In this scenario, we need to deal with following problem

argminxf1(x1)+f2(x2)

s.t.x1C1,x2C2

andh1(x1)+h2(x2)0

A little complicated than previous problem, the only difference is the vector inequality constraints h1(x1)+h2(x2)0. But from my perspective is still simple, under most circumstances, vector inequality may have more complicated situation as follow
h1(x1,y1)+h2(x2,y2)0

Primal Decompositon

Solve this problem with primal decomposition method, it works by set up two new sub-problem as follow

argminx1f1(x1)s.t.x1C1,h1(x1)t

argminx2f2(x2)s.t.x2C2,h2(x2)t

And get a new master problem
Φ(t)=f1(t)+f2(t)

We solve this new problem as follow: start with a random t, then solve f1 and f2 independently, then we update t according to the gradient of master problem. Repeat this process until convergence. Assume we get the optimal value for f1 and f2, then gradient of f1 or f2 with respect to t is the associated lagrange multiplier of inequality constraints. Proof as follow:
p(t~)supλ0infxf(x)+λT(h(x)t~)
infxf(x)+λ~T(h(x)t~)(Here the
λ~T is the optimal lagrange multipliers associated with inequality)
=infxf(x)+λ~(h(x)t+tt~)
=(infxf(x)+λ~T(h(x)t))+λ~(tt~)
So the sub-gradient is λ~T. And we can use the gradient to finish the update.

Dual Decomposition

In order to perform dual decomposition, we first write down the lagrange function of primal problem as follow:

L(x1,x2,λ)=f1(x1)+f2(x2)+λT(h1(x1)+h2(x2))
.
=(f1(x1)+λTh1(x1))+(f2(x2)+λTh2(x2))

From this structure we know the solving would be very easy. we solve the problem independently. We use λ as primary variable to control the optimization of two sub-problem.

General Structure Problem

In this section, general means have “coupling variable” and “complicating constraint”. For this kind of problem, we can convert to a sub-problem constrained by consistency constraint. For example, we have following problem

argminxf1(x1,y)+f2(x2,y)s.t.h1(x1)+h2(x2)0

we convert this problem into follow:
argminxf1(x1,y1)+f2(x2,y2)

s.t.h1(x1)z1;h2(x2)z2;y1=y2;z1=z2
Then we can solve sub-problem, the only difference is how to calculate the sub-gradient of master problem.
Combing decomposition with proximal gradient, we can derive powerful ADMM algorithm for distributed optimization L1-constrained problem which is widely used in everywhere.

Summary

Decomposition works by split original problem into more sub-sized problem and utilize the power of multiple machine.