@(Numerical Computation)[localization,subgradient,cutting plane]
Localization and Cutting Plane Methods
Continue the discussion of non-differentiable convex optimization, this blog is about localization and cutting plane methods. Roughly speaking, this method is similar to bisection method on R, find the optimal point by eliminating sections which guaranteed do not contain optimal point. There is no direct expansion of bisection from R to Rn, but sub-gradient of convex function will produce similar effect if used appropriate.
Basic Idea of Cutting Plane method
Before talking about details of cutting-plane method, we need to know the goals of this algorithm. Cutting plane method used to find a point x belong to ϵ−suboptimal set. ϵ−suboptimal set defined as follow:
{x|∥f0(z)−f0(x∗)∥≤ϵ}
where
x∗ represent the optimal point of the objective function. this
ϵ−suboptimal set is represented as
X.
Cutting plane method proceed as follow:
a. set k = 0
b. start with an initial polyhedron
Pk which guaranteed to contain optimal point
x∗.
c. selecting point
xk belong to polyhedron
Pk according to some kind of method. This selection of point
xk is critical to the performance of cutting plane method. There are lots of possible choice.
d. send
xk to an oracle. Oracle will tell you the point is in the
ϵ−suboptimal set or return an hyperplane which separate the
ϵ−suboptimal set and the rest points. this hyperplane must follow the following inequality:
aTx≤b∀x∈X
this hyperplane called cutting-plane.
e. if point
xk in the optimal set, go to step f; if not, add the hyperplane into polyhedron and set k = k + 1 then go to step c
f. return optimal point
xk.
After review the algorithm for cutting plane method, using cutting-plane method make this algorithm similar to bisection on
R. And the cutting-plane works as the midpoint of the bisection algorithm.
One of the most important question is how to calculate the cutting-plane for convex optimization problem. This problem can be solved by subgradient. Subgradient is defined as the vector fullfil following inequality:
f(z)≥f(x)+gT(z−x)∀z∈D
Cutting plane arose naturally from this definition. Following sections talk about how to calculate cutting plane for different type of problem.
Unconstrained Optimization
for unconstrained optimization problem:
argminxf0(x)
at any point
x, the subgradient
g∈∂f0(x) will give the following inequality:
f(z)≥f(x)+gT(z−x)
so all the points satisfy
gT(z−x)≥0 will never be optimal point. If we accumulate the best possible point found so far
fbest, we can give another better inequality:
gT(z−x)+f0(x)−fbest≤0
Feasibility Problem
Feasibility problem:
findx
s.t.fi(x)≤0,(i=1,...,m)
If all the inequality constraints are satisfied, then this problem is feasible. If not, we can find an inequality
fi(x) is violated at current point
xk. According the inequality:
fi(z)≥fi(x)+gTi(z−xk), then we know the all the point satisfies the inequality
fi(x)+gTi(z−xk)≤0
will be the cutting plane we need.
Inequality constrained problem
process the following problem
argminxf0(x)
s.t.fi(x)≤0,i=1,...,m
at current point
xk, we have two different situations. First, some of the constraints are violated. So for the constraint
fj(x), we can construct following cutting plane according to subgradient of
fj(x) at
xk
fj(xk)+gT(z−xk)≤0
this cutting plane is called
feasibility cut. Second, all the constraints are satisfied. Then calculate the subgradient
g0 of
f0(x) at
xk. If 0 =
g0, then
xk is the minimizer of
f0(x), otherwise we can construct following cutting plane
gT0(z−x)≤0
Summary
Cutting plane can be find efficiently for unconstrained problem, feasibility problem and inequality constrained problem. But seems can not handle equality constraints easily, which does not like subgradient method. When can handle the equality constraint by projection.
Convergence Proof
For cutting plane method, the progress of optimization is evaluated by the decrement of volume. If the algorithm want to convergence, the volume of polyhedron must decreased by a factor less than 1. For bisection method, each iteration the volume will decreased by 0.5. But for cutting-plane method in Rn, this decrement is hard to guarantee.
Specific Cutting-Plane and Localization Methods
According to the brief introduction of cutting-plane method, we know the critical step is to select the query point xk+1 inside polyhedron Pk. The strategy leads to most reduction in volume will be the most efficient one. Different strategies will have different computation cost and volume reduction.
Center of Gravity
The center of gravity of a set C defined as follow:
cg(C)=∫Czdz∫Cdz
for volume reduction, we have
vol(Pk+1)vol(Pk)≤1−1e≃0.63
One of the most important thing about this is this volume reduction does not depend on any problem parameters, like the dimension
n of the problem.
But the disadvantage of this algorithm is the CG is very hard to compute for a set described by a set of linear inequalities.
MVE cutting-plane method
This method called maximum volume inscribed ellipsoid. In this method, we select the xk+1 to be the center of the maximum volume ellipsoid that lies in Pk. The volume reduction can be described by factor:
vol(Pk+1)vol(Pk)≤1−1n
so the volume reduction factor depends on the dimension
n.
Analytic center cutting-plane method
The analytic center cutting-plane method uses the optimal solution of following optimization problem
argminx−∑i=1m0log(di−cTix)−∑i=1mklog(bi−aTix)
where the polyhedron
Pk=z|cTiz≤di,=1,...,m0;aTiz≤bi,i=1,...,mk
This problem is easy to optimize.
Ellipsoid Method
Ellipsoid method is an location algorithm, works in a similar way to cutting-plane methods but differs in some aspects. In cutting plane method, we describe our knowledge with a polyhedron, but in cutting plane method we use ellipsoid. Use ellipsoid to describe the optimization set has an memory advantage over cutting plane methods. In cutting plane methods, as the optimization proceeds, more and more constraints are added to form a smaller and smaller polyhedron. Too much linear inequality constraints will slow down the progress and occupy much more memory. But in ellipsoid method, only constant memory is needed, we only need a n∗(n+1) memory. Because ellipsoid is described by
ϵ={z|(z−x)TP−1(z−x)≤1}
where
P is positive definite.
Details about Ellipsoid method
At iteration k, we get an ellipsoid that contains the optimal point as follow:
ϵ={z|(z−xk)TP−1k(z−xk)≤1}
then we get the sub-gradient at point
xk, half-space statisfy
gTk(z−x)>0 will never contain the optimal point. So in the next step, we need to find to minimum volume cover the section describe by
ϵ={z|(z−xk)TP−1k(z−xk)≤1},gTk(z−x)≤0
where
g is the sub-gradient at point
xk. Fortunately, this ellipsoid can be found analytically and the new ellipsoid described as follow:
ϵ+={z|(z−x+)T(P+)−1(z−x+)≤1}
where
x+=x−1n+1Pg~
P+=n2n2−1(P−2n+1Pg~g~TP)
and
g~=1gTPg−−−−−√2g
Extensions
Some improvements can enhance the performance of cutting plane method.
Dropping constraints
When using cutting plane method, the number of inequality will grow as the number of iteration increase. Performance will be affected seriously by so many inequalities. So we need to abandon the inequality that is not important to us any more. We have a lot ways to do this.
linear programming
use linear programming to check some inequality is redundant works as follow:
maximizeaTizs.t.aTjz≤bj,j=1,...,m,j≠i
where we would like the check the inequality indexed by
i. If the optimal value smaller than
bi, then we can say the inequality
aTiz≤bi is redundant.
other methods
Sometimes we just keep the constraints in a fixed number ever though this will drop some useful constraints. In this situation, we drop the inequality according to the heuristics follow
bi−aTig∥FTai∥2
the smaller the score, the more relevant. So we can rank all the inequalities according to this score. The matrix
F and vector
g is get from the ellipsoid
ϵ={Fu+g|∥u∥2≤1}
which cover the current polyhedron
Pk.
Summary
Roughly speaking, localization and cutting plane methods is very slow yet reliable. When the objective is non-differentiable, cutting plane and ellipsoid method will works.