2016年4月8日星期五

Using locale information

Kernel Method (or Non-parametric Method)

Seems there are two different definition of “kernel methods” in machine learning. One definition related with RKHS and so on. The other one, just like some non-parametric methods. And the latter one is the topic in post, the main idea is about how to use localized information to get a model.

Unlike linear model, which construct a global function over all the sample spaces; kernel methods works by construct a localized function for each new sample point . We can see how this method can be applied to different tasks.

When apply this method to regression task, for each new sample point , it will construct a weight matrix based on some kernel function . This kernel function will assign higher weights to closer training points based on some norm. Then a weighted regression will be performed, getting a brand new predicting function and return the predicted value.

For regression task, there is also another kernel method. Which will weight samples within the neighborhood of new sample point and return a weighted average of response variable value.

When apply this method to density estimation, it will also construct a weight kernel decaying with the distance from the point . And then perform classification according to bayes rule. We can also use mixture of Gaussian to estimate the density more clearly for each classes.

For all the methods mentioned above, there will be a issue of bias and variance trade off.

Written with StackEdit.

2016年4月1日星期五

Notes on Linear Regression

Having read the linear regression chapter of Element of Statistical Learning, method is different compared with Pattern Recognition and Machine Learning. After the introduction of least square methods, ESL will talk about the variant of the estimator ( ). Well, this is something quite new to me.
The first question is why we need to do this ? What’s the benefits of doing such kind of inference? But more interesting point is, with assumption of truly underlying model is linear model: ESL gives hypothesis testing and interval estimation of the parameters. This is quite new, but the question would be what if the real underlying model isn’t linear. I think this is the most common scenario.
For other point, ESL give a detailed analysis and comparison of different shrinkage method, this is a clear description of “bias variance decomposition”. And also other advanced method like lasso path and LAR algorithm.