追风人: 九月 2014

Decomposition Methods

Some interesting structure may exist in objective function and constraints function, decomposition is an strategy to utilize the structure. In the simplest situation, when solve an unconstrained optimization problem and the objective function $f_0(x)$ can be written as

f 0 (x) = \sum i = 1 n f i (x i)

$f_0(x) = \sum_{i=1} ^ { n } { f_{i}(x_i) }$
So in this case, the variable are divided into many subvectors

x=(x1,...,xn) $x = (x_1, ..., x_n)$ . We can optimize each sub-vector independently without constraint of other variables. Decomposition method will split a big problem into many smaller size problem And the solve all these problem simultaneously or sequentially. We will talk decomposition method from an easy to hard order.

Unconstrained optimization problem

With Only Complicating Variable
In this case, we want to solve

f 0 (x) = f 1 (x 1, y) + f 2 (x 2, y) (1)

$f_0(x) = f_1(x_1, y) + f_2(x_2, y) \qquad (1)$
Here vector

x=(x1,x2,y) $x = (x_1, x_2, y)$ three different components. If

y $y$ is fixed, then we can solve

argminx1f1(x1,y) $\; argmin_{x_1} f_1(x_1, y) \;$ and

argminx2f2(x2,y) $\; argmin_{x_2} f_2(x_2, y) \;$ independently, so

y $y$ is called complicating variable. And

x1 $x_1$ and

x2 $x_2$ are called private variable, because they belong to different sub-system.

y $y$ can be called public variable, interface variable or boundary variable, just because it connect two sub-problem.

Primal decomposition

Inspired by the face that, if $y$ is fixed then we can solve two independent problem. We define two sub-problem as follow

Φ 1 (y) = a r g m i n x 1 f 1 (x 1, y); Φ 2 (y) = a r g m i n x 2 f 2 (x 2, y)

$\Phi_1(y) = argmin_{x_1} f_1(x_1, y) \; ; \; \Phi_2(y) = argmin_{x_2} f_2(x_2, y)$
and define a master problem which equivalent to problem (1).

Φ (y) = Φ 1 (y) + Φ 2 (y)

$\Phi(y) = \Phi_1(y) + \Phi_2(y)$
So primal decomposition works in the following order
1. start with initial guess of

y=y0 $y = y_0$
2. solve

Φ1 $\Phi_1$ and

Φ2 $\Phi_2$
3. solve

Φ $\Phi$ and get new value of

y $y$
This method called primal decomposition because we manipulate the primal variable directly. Solve the sub-problem could use any method, so the master problem.

Dual decomposition

Instead we solve the original problem directly, we transform the original problem (1) into follow

a r g m i n x f 0 (x) = f 1 (x 1, y 1) + f 2 (x 2, y 2)

$argmin_x \; f_0(x) = f_1(x_1, y_1) + f_2(x_2, y_2)$

s . t . y 1 = y 2

$\quad s.t. y_1 = y_2$
In this form, we create two copies of boundary variable

y1 $y_1$ and

y2 $y_2$ . And add a consensus constraint that

y1=y2 $y_1 = y_2$ . After this transformation, we found the problem is separable now, just with a new constraints.
Lagrangian problem becomes

L(x1,x2,y1,y2,λ)=f1(x1,y1)+f2(x2,y2)+λT(y1−y2) $L(x_1, x_2, y_1, y_2, \lambda) = f_1(x_1, y_1) + f_2(x_2, y_2) + \lambda ^ T (y_1 - y_2)$ then transform to

L(x1,x2,y1,y2,λ)=f1(x1,y1)+λTy1+f2(x2,y2)+λTy2 $L(x_1, x_2, y_1, y_2, \lambda) = f_1(x_1, y_1) + \lambda ^ T y_1 + f_2(x_2, y_2) + \lambda ^ T y_2$ , so the dual function becomes

g (λ) = g 1 (λ 1) + g 2 (λ 2)

$g( \lambda ) = g_1( \lambda_1 ) + g_2( \lambda_2 )$

g1(λ)=infx1,y2f1(x1,y1)+λTy1 $\qquad g_1( \lambda) = \inf_{ x_1, y_2} { f_1(x_1, y_1) + \lambda ^ T y_1 }$

g2(λ)=infx2,y2f2(x2,y2)−λTy2 $\qquad g_2( \lambda) = \inf_{x_2, y_2} { f_2(x_2, y_2) - \lambda ^ T y_2 }$
Now the sub-problem

g1(λ) $g_1 (\lambda)$ and

g2(λ) $g_2(\lambda)$ can be solved independently and then solve the master problem.

Sub-gradient for master problem

If we wan to solve the master problem with sub-gradient method or cutting-plane method, we need to calculate the sub-gradient of master problem with respect to $\lambda$ . Calculate the sub-gradient for $g_1(\lambda)$ and $g_2( \lambda )$ is easy. If $\bar{ x_1 }$ and $\bar { x_2 }$ optimize $g_1( \lambda )$ over $x_1$ and $y_1$ , then the subgradient is $y_1$ . For $g_2 ( \lambda )$ , the corresponding sub-gradient is $- y_2$ .

Convergence Estimation

Under some circumstance, we will perform some convergence test. Usual convergence criterion include
1. Gradient of the norm less than a predefined $\epsilon$
2. norm of difference of function value between iterations less than $\epsilon$
3. norm of difference of optimization variable between iterations less than $\epsilon$
4. norm of difference of current value and optimal value less than $\epsilon$
When using Dual decomposition method, we can estimate the difference of current function value $f_{ current }$ and optimal value $f ^ *$ . Work as follow:
After iteration k, we get optimal value for sub-problem: $\bar{ x_1 }, \; \bar{y_1}, \; \bar{x_2}, \; \bar{y_2}$ , then we get the lower bound as follow:

f * \geq f 1 (x 1 ¯, y 1 ¯) + f 2 (x 2 ¯, y 2 ¯) + λ T (y 1 ¯ - y 2 ¯)

$f ^ * \ge f_1( \bar{x_1}, \bar{y_1}) + f_2(\bar{x_2}, \bar{y_2}) + \lambda ^ T ( \bar{y_1} - \bar{y_2} )$
when the optimization not convergence the

y1≠y2 $y_1 \neq y_2$ , we can construct

y^=y1¯+y2¯/2 $\hat { y } = \bar{y_1} + \bar{y_2} /2$ , then

f * < f 1 (x 1 ¯, y 1 ¯) + f 2 (x 2 ¯, y 2 ¯)

$f ^ * \lt f_1(\bar{x_1}, \bar{y_1}) + f_2(\bar{x_2}, \bar{y_2})$
when we get upper

fupper $f_{ upper }$ and lower bound

flower $f_{lower}$ , we can estimate the difference

fupper−flower $f_{upper} - f_{lower}$ , if less than

ϵ $\epsilon$ then we can terminate the optimization process.
Another better way to estimate the upper bound is to optimize the primal problem given

y^ $\hat{y}$ .

Constrained Optimization

With only complicating variable
In this scenario, we need to deal with following problem

a r g m i n x f 1 (x 1) + f 2 (x 2)

$argmin_x \quad f_1(x_1) + f_2(x_2)$

s . t . x 1 \in C 1, x 2 \in C 2

$s.t. \; x_1 \in C_1, x_2 \in C_2$

a n d h 1 (x 1) + h 2 (x 2) ⪯ 0

$and \; h_1(x_1) + h_2(x_2) \preceq 0$
A little complicated than previous problem, the only difference is the vector inequality constraints

h1(x1)+h2(x2)⪯0 $h_1(x_1) + h_2(x_2) \preceq 0$ . But from my perspective is still simple, under most circumstances, vector inequality may have more complicated situation as follow

h 1 (x 1, y 1) + h 2 (x 2, y 2) ⪯ 0

$h_1(x_1, y_1) + h_2(x_2, y_2) \preceq 0$

Primal Decompositon

Solve this problem with primal decomposition method, it works by set up two new sub-problem as follow

a r g m i n x 1 f 1 (x 1) s . t . x 1 \in C 1, h 1 (x 1) ⪯ t

$argmin_{x_1} \; f_1(x_1) \quad s.t. \; x_1 \in C_1, \; h_1(x_1) \preceq t$

a r g m i n x 2 f 2 (x 2) s . t . x 2 \in C 2, h 2 (x 2) ⪯ - t

$argmin_{x_2} \; f_2(x_2) \quad s.t. \; x_2 \in C_2, \; h_2(x_2) \preceq -t$
And get a new master problem

Φ (t) = f 1 (t) + f 2 (t)

$\Phi(t) = f_1(t) + f_2(t)$
We solve this new problem as follow: start with a random

t $t$ , then solve

f1 $f_1$ and

f2 $f_2$ independently, then we update

t $t$ according to the gradient of master problem. Repeat this process until convergence. Assume we get the optimal value for

f1 $f_1$ and

f2 $f_2$ , then gradient of

f1 $f_1$ or

f2 $f_2$ with respect to

t $t$ is the associated lagrange multiplier of inequality constraints. Proof as follow:

p(t~)≥supλ⪰0infxf(x)+λT(h(x)−t~) $p(\tilde{t}) \ge \sup_{\lambda \succeq 0} \inf_{x} { f(x) + \lambda ^ T (h(x) - \tilde{t} ) }$

≥infxf(x)+λ~T(h(x)−t~) $\qquad \, \ge \inf_{x} { f(x) + \tilde{ \lambda } ^ T } (h(x) - \tilde{t} )$ (Here the

λ~T $\tilde{ \lambda } ^ T$ is the optimal lagrange multipliers associated with inequality)

=infxf(x)+λ~(h(x)−t+t−t~) $\qquad \, = \inf_{x} { f(x) + \tilde{ \lambda } }( h(x) - t + t - \tilde{t} )$

=(infxf(x)+λ~T(h(x)−t))+λ~(t−t~) $\qquad \, = ( \inf_{x}{ f(x) + \tilde{\lambda} } ^ T(h(x) - t)) + \tilde{\lambda}(t - \tilde{t})$
So the sub-gradient is

−λ~T $- \tilde{ \lambda } ^ T$ . And we can use the gradient to finish the update.

Dual Decomposition

In order to perform dual decomposition, we first write down the lagrange function of primal problem as follow:

L (x 1, x 2, λ) = f 1 (x 1) + f 2 (x 2) + λ T (h 1 (x 1) + h 2 (x 2))

$L(x_1, x_2, \lambda) = f_1(x_1) + f_2(x_2) + \lambda ^ T (h_1(x_1) + h_2(x_2) )$ .

= (f 1 (x 1) + λ T h 1 (x 1)) + (f 2 (x 2) + λ T h 2 (x 2))

$= (f_1(x_1) + \lambda ^ T h_1(x_1)) + (f_2(x_2) + \lambda ^ T h_2(x_2))$
From this structure we know the solving would be very easy. we solve the problem independently. We use

λ $\lambda$ as primary variable to control the optimization of two sub-problem.

General Structure Problem

In this section, general means have “coupling variable” and “complicating constraint”. For this kind of problem, we can convert to a sub-problem constrained by consistency constraint. For example, we have following problem

a r g m i n x f 1 (x 1, y) + f 2 (x 2, y) s . t . h 1 (x 1) + h 2 (x 2) \leq 0

$argmin_{x} f_1(x_1, y) + f_2(x_2, y) \; s.t. \; h_1(x_1) + h_2(x_2) \le 0$
we convert this problem into follow:

a r g m i n x f 1 (x 1, y 1) + f 2 (x 2, y 2)

$argmin_{x} f_1(x_1, y_1) + f_2(x_2, y_2) \;$

s.t.h1(x1)≤z1;h2(x2)≤−z2;y1=y2;z1=z2 $s.t. \; h_1(x_1) \le z_1 ; \; h_2(x_2) \le - z_2 \; ; y_1 = y_2\; ; z_1 = z_2$
Then we can solve sub-problem, the only difference is how to calculate the sub-gradient of master problem.
Combing decomposition with proximal gradient, we can derive powerful ADMM algorithm for distributed optimization L1-constrained problem which is widely used in everywhere.

Summary

Decomposition works by split original problem into more sub-sized problem and utilize the power of multiple machine.

追风人

2014年9月22日星期一

Primal & Dual Decomposition Methods