追风人: 八月 2014

@(Numerical Computation)[localization,subgradient,cutting plane]

Localization and Cutting Plane Methods

Continue the discussion of non-differentiable convex optimization, this blog is about localization and cutting plane methods. Roughly speaking, this method is similar to bisection method on $R$ , find the optimal point by eliminating sections which guaranteed do not contain optimal point. There is no direct expansion of bisection from $R$ to $R ^ { \; n }$ , but sub-gradient of convex function will produce similar effect if used appropriate.

Basic Idea of Cutting Plane method

Before talking about details of cutting-plane method, we need to know the goals of this algorithm. Cutting plane method used to find a point $x$ belong to $\epsilon-suboptimal$ set. $\epsilon-suboptimal$ set defined as follow:

{x | ∥ f 0 (z) - f 0 (x *) ∥ \leq ϵ}

$\{ x \; | \; \|f_0(z) - f_0(x^*) \| \le \epsilon \}$
where

x∗ $x ^ *$ represent the optimal point of the objective function. this

ϵ−suboptimal $\epsilon-suboptimal$ set is represented as

X $X$ .
Cutting plane method proceed as follow:
a. set k = 0
b. start with an initial polyhedron

Pk $P_k$ which guaranteed to contain optimal point

x∗ $x ^ *$ .
c. selecting point

xk $x_k$ belong to polyhedron

Pk $P_k$ according to some kind of method. This selection of point

xk $x_k$ is critical to the performance of cutting plane method. There are lots of possible choice.
d. send

xk $x_k$ to an oracle. Oracle will tell you the point is in the

ϵ−suboptimal $\epsilon-suboptimal$ set or return an hyperplane which separate the

ϵ−suboptimal $\epsilon-suboptimal$ set and the rest points. this hyperplane must follow the following inequality:

a T x \leq b \forall x \in X

$a^T x \le b \; \forall x \in X$
this hyperplane called cutting-plane.
e. if point

xk $x_k$ in the optimal set, go to step f; if not, add the hyperplane into polyhedron and set k = k + 1 then go to step c
f. return optimal point

xk $x_k$ .
After review the algorithm for cutting plane method, using cutting-plane method make this algorithm similar to bisection on

R $R$ . And the cutting-plane works as the midpoint of the bisection algorithm.
One of the most important question is how to calculate the cutting-plane for convex optimization problem. This problem can be solved by subgradient. Subgradient is defined as the vector fullfil following inequality:

f (z) \geq f (x) + g T (z - x) \forall z \in D

$f(z) \ge f(x) + g ^ T ( z - x ) \quad \forall \; z \; \in D$
Cutting plane arose naturally from this definition. Following sections talk about how to calculate cutting plane for different type of problem.

Unconstrained Optimization

for unconstrained optimization problem:

a r g m i n x f 0 (x)

$argmin_x \; f_0(x)$
at any point

x $x$ , the subgradient

g∈∂f0(x) $g \in \partial f_0(x)$ will give the following inequality:

f (z) \geq f (x) + g T (z - x)

$f(z) \ge f(x) + g ^ T (z-x)$
so all the points satisfy

gT(z−x)≥0 $g ^ T (z-x) \ge 0$ will never be optimal point. If we accumulate the best possible point found so far

fbest $f_{best}$ , we can give another better inequality:

g T (z - x) + f 0 (x) - f b e s t \leq 0

$g ^ T (z - x) + f_0(x) - f_{best} \le 0$

Feasibility Problem

Feasibility problem:

f i n d x

$find \quad x$

s . t . f i (x) \leq 0, (i = 1, . . ., m)

$s.t. f_i(x) \le 0, \; (i=1, ..., m)$
If all the inequality constraints are satisfied, then this problem is feasible. If not, we can find an inequality

fi(x) $f_i(x)$ is violated at current point

xk $x_k$ . According the inequality:

fi(z)≥fi(x)+gTi(z−xk) $f_i(z) \ge f_i(x) + g_i^T(z-x_k)$ , then we know the all the point satisfies the inequality

f i (x) + g T i (z - x k) \leq 0

$f_i(x) + g_i^T(z-x_k) \le 0$
will be the cutting plane we need.

Inequality constrained problem

process the following problem

a r g m i n x f 0 (x)

$argmin_x \quad f_0(x)$

s . t . f i (x) \leq 0, i = 1, . . ., m

$s.t. \quad f_i(x) \le 0 \;, \quad i=1, ..., m$
at current point

xk $x_k$ , we have two different situations. First, some of the constraints are violated. So for the constraint

fj(x) $f_j(x)$ , we can construct following cutting plane according to subgradient of

fj(x) $f_j(x)$ at

xk $x_k$

f j (x k) + g T (z - x k) \leq 0

$f_j(x_k) + g ^ T (z-x_k) \le 0$
this cutting plane is called feasibility cut. Second, all the constraints are satisfied. Then calculate the subgradient

g0 $g_0$ of

f0(x) $f_0(x)$ at

xk $x_k$ . If 0 =

g0 $g_0$ , then

xk $x_k$ is the minimizer of

f0(x) $f_0(x)$ , otherwise we can construct following cutting plane

g T 0 (z - x) \leq 0

$g_{0} ^ T (z-x) \le 0$

Summary

Cutting plane can be find efficiently for unconstrained problem, feasibility problem and inequality constrained problem. But seems can not handle equality constraints easily, which does not like subgradient method. When can handle the equality constraint by projection.

Convergence Proof

For cutting plane method, the progress of optimization is evaluated by the decrement of volume. If the algorithm want to convergence, the volume of polyhedron must decreased by a factor less than 1. For bisection method, each iteration the volume will decreased by 0.5. But for cutting-plane method in $R ^ n$ , this decrement is hard to guarantee.

Specific Cutting-Plane and Localization Methods

According to the brief introduction of cutting-plane method, we know the critical step is to select the query point $x_{k+1}$ inside polyhedron $P_k$ . The strategy leads to most reduction in volume will be the most efficient one. Different strategies will have different computation cost and volume reduction.

Center of Gravity

The center of gravity of a set C defined as follow:

c g (C) = \int C z d z \int C d z

$cg(C) = \frac{ \int_C z \; dz }{ \int_C dz }$
for volume reduction, we have

v o l ( P k + 1 ) v o l ( P k ) \leq 1 - 1 e ≃ 0.63

$\frac{vol(P_{k+1})}{vol(P_k)} \le 1 - \frac{1}{e} \simeq 0.63$
One of the most important thing about this is this volume reduction does not depend on any problem parameters, like the dimension

n $n$ of the problem.
But the disadvantage of this algorithm is the CG is very hard to compute for a set described by a set of linear inequalities.

MVE cutting-plane method

This method called maximum volume inscribed ellipsoid. In this method, we select the $x_{k+1}$ to be the center of the maximum volume ellipsoid that lies in $P_k$ . The volume reduction can be described by factor:

v o l ( P k + 1 ) v o l ( P k ) \leq 1 - 1 n

$\frac{ vol(P_{k+1}) } { vol(P_k) } \le \; 1 - \frac{1}{n}$
so the volume reduction factor depends on the dimension

n $n$ .

Analytic center cutting-plane method

The analytic center cutting-plane method uses the optimal solution of following optimization problem

a r g m i n x - \sum i = 1 m 0 l o g (d i - c T i x) - \sum i = 1 m k l o g (b i - a T i x)

$argmin_x - \sum_{i=1} ^ {m_0} {log(d_i - c_i^Tx)} - \sum_{i=1} ^ {m_k} {log(b_i - a_i^Tx)}$
where the polyhedron

P k = z | c T i z \leq d i, = 1, . . ., m 0; a T i z \leq b i, i = 1, . . ., m k

$P_k = {z | c_i^Tz \le d_i, =1, ..., m_0; a_i^Tz \le b_i, i=1, ..., m_k }$
This problem is easy to optimize.

Ellipsoid Method

Ellipsoid method is an location algorithm, works in a similar way to cutting-plane methods but differs in some aspects. In cutting plane method, we describe our knowledge with a polyhedron, but in cutting plane method we use ellipsoid. Use ellipsoid to describe the optimization set has an memory advantage over cutting plane methods. In cutting plane methods, as the optimization proceeds, more and more constraints are added to form a smaller and smaller polyhedron. Too much linear inequality constraints will slow down the progress and occupy much more memory. But in ellipsoid method, only constant memory is needed, we only need a $n*(n+1)$ memory. Because ellipsoid is described by

ϵ = {z | (z - x) T P - 1 (z - x) \leq 1}

$\epsilon = \{ z | (z-x)^T P ^ {-1} (z-x) \le 1 \}$
where

P $P$ is positive definite.

Details about Ellipsoid method

At iteration k, we get an ellipsoid that contains the optimal point as follow:

ϵ = {z | (z - x k) T P - 1 k (z - x k) \leq 1}

$\epsilon = \{ z | (z-x_k)^T P_{k} ^ {-1} (z-x_k) \le 1 \}$
then we get the sub-gradient at point

xk $x_k$ , half-space statisfy

gTk(z−x)>0 $g_k ^ T (z-x) \gt 0$ will never contain the optimal point. So in the next step, we need to find to minimum volume cover the section describe by

ϵ = {z | (z - x k) T P - 1 k (z - x k) \leq 1}, g T k (z - x) \leq 0

$\epsilon = \{ z | (z-x_k)^T P_{k} ^ {-1} (z-x_k) \le 1 \}\; , \; g_k ^ T (z-x) \le 0$
where

g $g$ is the sub-gradient at point

xk $x_k$ . Fortunately, this ellipsoid can be found analytically and the new ellipsoid described as follow:

ϵ + = {z | (z - x +) T (P +) - 1 (z - x +) \leq 1}

$\epsilon_{+} = \{ z | (z-x^{+}) ^ T (P ^ +) ^ {-1} ( z - x ^ { + } ) \le 1 \}$
where

x + = x - 1 n + 1 P g ~

$x ^ + = x - \frac {1} {n+1} P \tilde{g}$

P + = n 2 n 2 - 1 (P - 2 n + 1 P g ~ g ~ T P)

$P ^ + = \frac { n ^ 2 } { n ^ 2 -1 } ( P - \frac {2} {n+1} P \tilde{g} \tilde{g} ^ T P )$
and

g ~ = 1 g T P g - - - - - \sqrt 2 g

$\tilde{g} = \frac { 1 } { \sqrt[2] { g ^ T P g } } g$

Extensions

Some improvements can enhance the performance of cutting plane method.

Dropping constraints

When using cutting plane method, the number of inequality will grow as the number of iteration increase. Performance will be affected seriously by so many inequalities. So we need to abandon the inequality that is not important to us any more. We have a lot ways to do this.

linear programming

use linear programming to check some inequality is redundant works as follow:

m a x i m i z e a T i z s . t . a T j z \leq b j, j = 1, . . ., m, j \neq i

$maximize \quad a_i ^ T z \quad s.t. \quad a_j ^ T z \le b_j, \; j=1, ..., m, j \neq i$
where we would like the check the inequality indexed by

i $i$ . If the optimal value smaller than

bi $b_i$ , then we can say the inequality

aTiz≤bi $a_i ^ T z \le b_i$ is redundant.

other methods

Sometimes we just keep the constraints in a fixed number ever though this will drop some useful constraints. In this situation, we drop the inequality according to the heuristics follow

b i - a T i g ∥ F T a i ∥ 2

$\frac {b_i - a_i ^ T g } { \| F ^ T a_i \|_2 }$
the smaller the score, the more relevant. So we can rank all the inequalities according to this score. The matrix

F $F$ and vector

g $g$ is get from the ellipsoid

ϵ = {F u + g | ∥ u ∥ 2 \leq 1}

$\epsilon = \{ Fu + g | \| u \|_2 \le 1 \}$
which cover the current polyhedron

Pk $P_k$ .

Summary

Roughly speaking, localization and cutting plane methods is very slow yet reliable. When the objective is non-differentiable, cutting plane and ellipsoid method will works.

追风人

2014年8月25日星期一

Notes on Distributed optimization

Notes on Distributed Optimization

Using Automatic difference method

Use Matrix Decomposition method

2014年8月23日星期六

Localization and Cutting Plane method

Localization and Cutting Plane Methods

Basic Idea of Cutting Plane method

Unconstrained Optimization

Feasibility Problem

Inequality constrained problem

Summary

Convergence Proof

Specific Cutting-Plane and Localization Methods

Center of Gravity

MVE cutting-plane method

Analytic center cutting-plane method

Ellipsoid Method

Details about Ellipsoid method

Extensions

Dropping constraints

linear programming

other methods

Summary