2014年8月25日星期一

Notes on Distributed optimization

Notes on Distributed Optimization

From liblinear, if we want to use second order methods for optimization, some kind of approximation is need. Because we need to use hessian matrix, its’ a huge matrix and can’t be stored in memory. The key point is some algorithm just need the Hv. A matrix multiplied by a vector. And as long as we can compute the Matrix-Vector product, we do not need Hessian anymore.

Using Automatic difference method

Get method to calculate the gradient, then use automatically difference to calculate the Hessian.

Use Matrix Decomposition method

Some hessian matrix can be decomposed into H=I+CXTDX. Well, this means some hessian matrix can be written as the low rank update for Identity matrix.

Just some notes.

Written with StackEdit.

2014年8月23日星期六

Localization and Cutting Plane method

@(Numerical Computation)[localization,subgradient,cutting plane]

Localization and Cutting Plane Methods

Continue the discussion of non-differentiable convex optimization, this blog is about localization and cutting plane methods. Roughly speaking, this method is similar to bisection method on R, find the optimal point by eliminating sections which guaranteed do not contain optimal point. There is no direct expansion of bisection from R to Rn, but sub-gradient of convex function will produce similar effect if used appropriate.

Basic Idea of Cutting Plane method

Before talking about details of cutting-plane method, we need to know the goals of this algorithm. Cutting plane method used to find a point x belong to ϵsuboptimal set. ϵsuboptimal set defined as follow:

{x|f0(z)f0(x)ϵ}

where x represent the optimal point of the objective function. this ϵsuboptimal set is represented as X.
Cutting plane method proceed as follow:
a. set k = 0
b. start with an initial polyhedron Pk which guaranteed to contain optimal point x.
c. selecting point xk belong to polyhedron Pk according to some kind of method. This selection of point xk is critical to the performance of cutting plane method. There are lots of possible choice.
d. send xk to an oracle. Oracle will tell you the point is in the ϵsuboptimal set or return an hyperplane which separate the ϵsuboptimal set and the rest points. this hyperplane must follow the following inequality:
aTxbxX

this hyperplane called cutting-plane.
e. if point xk in the optimal set, go to step f; if not, add the hyperplane into polyhedron and set k = k + 1 then go to step c
f. return optimal point xk.
After review the algorithm for cutting plane method, using cutting-plane method make this algorithm similar to bisection on R. And the cutting-plane works as the midpoint of the bisection algorithm.
One of the most important question is how to calculate the cutting-plane for convex optimization problem. This problem can be solved by subgradient. Subgradient is defined as the vector fullfil following inequality:
f(z)f(x)+gT(zx)zD

Cutting plane arose naturally from this definition. Following sections talk about how to calculate cutting plane for different type of problem.

Unconstrained Optimization

for unconstrained optimization problem:

argminxf0(x)

at any point x, the subgradient gf0(x) will give the following inequality:
f(z)f(x)+gT(zx)

so all the points satisfy gT(zx)0 will never be optimal point. If we accumulate the best possible point found so far fbest, we can give another better inequality:
gT(zx)+f0(x)fbest0

Feasibility Problem

Feasibility problem:

findx

s.t.fi(x)0,(i=1,...,m)

If all the inequality constraints are satisfied, then this problem is feasible. If not, we can find an inequality fi(x) is violated at current point xk. According the inequality: fi(z)fi(x)+gTi(zxk), then we know the all the point satisfies the inequality
fi(x)+gTi(zxk)0

will be the cutting plane we need.

Inequality constrained problem

process the following problem

argminxf0(x)

s.t.fi(x)0,i=1,...,m

at current point xk, we have two different situations. First, some of the constraints are violated. So for the constraint fj(x), we can construct following cutting plane according to subgradient of fj(x) at xk
fj(xk)+gT(zxk)0

this cutting plane is called feasibility cut. Second, all the constraints are satisfied. Then calculate the subgradient g0 of f0(x) at xk. If 0 = g0, then xk is the minimizer of f0(x), otherwise we can construct following cutting plane
gT0(zx)0

Summary

Cutting plane can be find efficiently for unconstrained problem, feasibility problem and inequality constrained problem. But seems can not handle equality constraints easily, which does not like subgradient method. When can handle the equality constraint by projection.

Convergence Proof

For cutting plane method, the progress of optimization is evaluated by the decrement of volume. If the algorithm want to convergence, the volume of polyhedron must decreased by a factor less than 1. For bisection method, each iteration the volume will decreased by 0.5. But for cutting-plane method in Rn, this decrement is hard to guarantee.

Specific Cutting-Plane and Localization Methods

According to the brief introduction of cutting-plane method, we know the critical step is to select the query point xk+1 inside polyhedron Pk. The strategy leads to most reduction in volume will be the most efficient one. Different strategies will have different computation cost and volume reduction.

Center of Gravity

The center of gravity of a set C defined as follow:

cg(C)=CzdzCdz

for volume reduction, we have
vol(Pk+1)vol(Pk)11e0.63

One of the most important thing about this is this volume reduction does not depend on any problem parameters, like the dimension n of the problem.
But the disadvantage of this algorithm is the CG is very hard to compute for a set described by a set of linear inequalities.

MVE cutting-plane method

This method called maximum volume inscribed ellipsoid. In this method, we select the xk+1 to be the center of the maximum volume ellipsoid that lies in Pk. The volume reduction can be described by factor:

vol(Pk+1)vol(Pk)11n

so the volume reduction factor depends on the dimension n.

Analytic center cutting-plane method

The analytic center cutting-plane method uses the optimal solution of following optimization problem

argminxi=1m0log(dicTix)i=1mklog(biaTix)

where the polyhedron
Pk=z|cTizdi,=1,...,m0;aTizbi,i=1,...,mk

This problem is easy to optimize.

Ellipsoid Method

Ellipsoid method is an location algorithm, works in a similar way to cutting-plane methods but differs in some aspects. In cutting plane method, we describe our knowledge with a polyhedron, but in cutting plane method we use ellipsoid. Use ellipsoid to describe the optimization set has an memory advantage over cutting plane methods. In cutting plane methods, as the optimization proceeds, more and more constraints are added to form a smaller and smaller polyhedron. Too much linear inequality constraints will slow down the progress and occupy much more memory. But in ellipsoid method, only constant memory is needed, we only need a n(n+1) memory. Because ellipsoid is described by

ϵ={z|(zx)TP1(zx)1}

where P is positive definite.

Details about Ellipsoid method

At iteration k, we get an ellipsoid that contains the optimal point as follow:

ϵ={z|(zxk)TP1k(zxk)1}

then we get the sub-gradient at point xk, half-space statisfy gTk(zx)>0 will never contain the optimal point. So in the next step, we need to find to minimum volume cover the section describe by
ϵ={z|(zxk)TP1k(zxk)1},gTk(zx)0

where g is the sub-gradient at point xk. Fortunately, this ellipsoid can be found analytically and the new ellipsoid described as follow:
ϵ+={z|(zx+)T(P+)1(zx+)1}

where
x+=x1n+1Pg~

P+=n2n21(P2n+1Pg~g~TP)

and
g~=1gTPg2g

Extensions

Some improvements can enhance the performance of cutting plane method.

Dropping constraints

When using cutting plane method, the number of inequality will grow as the number of iteration increase. Performance will be affected seriously by so many inequalities. So we need to abandon the inequality that is not important to us any more. We have a lot ways to do this.

linear programming

use linear programming to check some inequality is redundant works as follow:

maximizeaTizs.t.aTjzbj,j=1,...,m,ji

where we would like the check the inequality indexed by i. If the optimal value smaller than bi, then we can say the inequality aTizbi is redundant.

other methods

Sometimes we just keep the constraints in a fixed number ever though this will drop some useful constraints. In this situation, we drop the inequality according to the heuristics follow

biaTigFTai2

the smaller the score, the more relevant. So we can rank all the inequalities according to this score. The matrix F and vector g is get from the ellipsoid
ϵ={Fu+g|u21}

which cover the current polyhedron Pk.

Summary

Roughly speaking, localization and cutting plane methods is very slow yet reliable. When the objective is non-differentiable, cutting plane and ellipsoid method will works.