追风人: Machine Learning the future

This is a note from course, Machine Learning the future

Key Points to future

There are several important pieces need to be solved for a better machine learning model of future:

Online Learning
Representation
Exploration
Reinforcement

There are other topics covered, but I don’t think it’s important in this case. All the 4 topics mentioned above have huge impact on real world applications.

Online Learning

Questions need to be solved for stable online optimization algorithm:

Sample Imbalance

Sample imbalance is widely known issue in real application, there are multiple ways to handle this issue

down sampling the negative. will create a balanced dataset.
Give rare samples higher weights.

Using weighted sample will create challenging problem for gradient update rule: how to utilize the weight information?
Naively multiply gradient by weight will over-run the update, possibly create worse result.
Ideally, the weight should act as multiple run with same example, but each run will change the weight accordingly.

Learning Rate

Online learning is sensitive to learning rate. Mostly this can be solved with algorithms like adagrad.

Scale of feature value

Different feature value has different units. Otherwise, feature with larger value will dominate the weight update.
We can try mean-variance normalization preprocessing tricks. But this will hurt sparsity pattern existed in training data.

Explore/Exploit Tradeoff

For interactive service, exploration/exploitation is a commonly topic for service provider. Before solve this problem, we need to able to evaluation different policy.

Uniformly Randomization Logging for Data Collection

In the usual case, each action must be taken based on predicted probability. But for a small percentage of traffic, we uniformly select one of K actions and logging the result accordingly.
In this way, we have data to evaluate model developed offline. Otherwise, prod model will always dominate the training samples. Offline developed model won’t have opportunity to get selected.

UCB/LinUCB/Bandits

Contextual bandits algorithm are widely used to solve explore/exploit problem. LinUCB is a promising algorithm to solve this problem, with assumption of loss is linear form.
LinUCB seems applied in news recommendation system to improve the CTR.

Offline Policy Evaluation

Evaluation is the most important step for products and machine learning problem. There are 2 aspects of evaluation:

What’s the evaluation metric?
What’s the evaluation data?

The usage of data decide the validness of evaluation metric. For system use bandit/policy to do online learning system, a good offline evaluation system will decide the iteration speed of offline modeling.
There are multiple ways to do offline evaluation of reinforcement learning algorithm (off policy evaluation)

Learning To Search

Learning to search is a framework to jointly optimize complex loss functions. There are multiple approaches to this problem:

optimize independently.
Multi-Task learning
Graphical Model

Learning to search is another framework to this problem. The general idea is to treat the problem as reinforcement learning problem.
While the true label will act as the best available policy to guide the learning.

追风人

2019年12月31日星期二

Machine Learning the future