2019总结

按照媳妇的要求，给出2019的年终总结，希望吸取下经验教训，收获美好的2020。

人生感慨

不得不感慨下，2020我就30了。没有想到这么快就到而立之年，和优秀的人相比在事业和生活上都有太多的差距。希望在未来的时间能够尽量追赶上吧。

我的2019

宝宝出生

为了怀孕的身体健康考虑，老婆从英国回到老家，希望在家修养对老婆和宝宝的都有好处。当然也是担心因为我们不熟悉NHS而对生产有不好的影响。
宝宝终于在6月来到这个世界啦，幸运的是宝宝和老婆都身体健康。非常感谢老婆，为了让宝宝安全的来到这个世界，在家疯狂喝豆浆 :(。
为了迎接宝宝，我先请了一个月的产假，而后在12月份继续请了两个月的产假回家陪老婆跟宝宝。发现养一个小宝宝真是太累了，太费神了。

工作

仍然在同一家公司，不过这一年在工作上也是有些小小的波动。在上半年，工作所在的组因为方向问题被直接解散了，而我也转到了新组。不过绩效没有收到太大的影响。
下半年发生的事情就更多了，直线经理各种变化，所做的事情也是各种改动。不过幸运的是所做的事情也是比较顺利。虽然现在并不知道最终的结果，但是预测应该是比较好的结果吧。
个人也从这些变动上学到了很多，最重要的是个人的能力还是很重要的。只有个人能力强，才能应对各种变动的风险。

学习

老婆回国之后，我便一个人在伦敦生活。一个人的生活也还是有点无聊，所以在无聊的时候也学习了相关的知识。
不过所学的东西并不是特别的系统和专业，所以感觉还是东一榔头西一棒槌，感觉对未来的事业帮助并不是特别有效。希望未来对这一块有所改进。

我的2020

2020有太多的想法，不过总的来说

家庭

因为我个人的规划失误，导致老婆现在没有满意的工作，希望新的一年老婆能找到自己满意的下一步。
目前对家庭收入方面还是有很多的担心，所有的收入都来自于工资。希望在新的一年里能够找到新的收入来源，慢慢的降低长远的风险吧。
而且在新的一年里，也希望能做一些事情增加家庭团圆的概率吧，毕竟老婆要选择下一步嘛。

工作

在工作上，不得不说出我长久以来的愿望：还是希望能升个级吧:(。另外就是能系统的提高统筹规划的能力，目前这个方面实在是太缺了，感觉只能做一个小小的执行者啊，这个还是太不行了。
另外一个想法就是系统的提升自己对整个广告系统的了解，提高自己独立做大项目的能力。

Written with StackEdit.

This is a note from course, Machine Learning the future

Key Points to future

There are several important pieces need to be solved for a better machine learning model of future:

Online Learning
Representation
Exploration
Reinforcement

There are other topics covered, but I don’t think it’s important in this case. All the 4 topics mentioned above have huge impact on real world applications.

Online Learning

Questions need to be solved for stable online optimization algorithm:

Sample Imbalance

Sample imbalance is widely known issue in real application, there are multiple ways to handle this issue

down sampling the negative. will create a balanced dataset.
Give rare samples higher weights.

Using weighted sample will create challenging problem for gradient update rule: how to utilize the weight information?
Naively multiply gradient by weight will over-run the update, possibly create worse result.
Ideally, the weight should act as multiple run with same example, but each run will change the weight accordingly.

Learning Rate

Online learning is sensitive to learning rate. Mostly this can be solved with algorithms like adagrad.

Scale of feature value

Different feature value has different units. Otherwise, feature with larger value will dominate the weight update.
We can try mean-variance normalization preprocessing tricks. But this will hurt sparsity pattern existed in training data.

Explore/Exploit Tradeoff

For interactive service, exploration/exploitation is a commonly topic for service provider. Before solve this problem, we need to able to evaluation different policy.

Uniformly Randomization Logging for Data Collection

In the usual case, each action must be taken based on predicted probability. But for a small percentage of traffic, we uniformly select one of K actions and logging the result accordingly.
In this way, we have data to evaluate model developed offline. Otherwise, prod model will always dominate the training samples. Offline developed model won’t have opportunity to get selected.

UCB/LinUCB/Bandits

Contextual bandits algorithm are widely used to solve explore/exploit problem. LinUCB is a promising algorithm to solve this problem, with assumption of loss is linear form.
LinUCB seems applied in news recommendation system to improve the CTR.

Offline Policy Evaluation

Evaluation is the most important step for products and machine learning problem. There are 2 aspects of evaluation:

What’s the evaluation metric?
What’s the evaluation data?

The usage of data decide the validness of evaluation metric. For system use bandit/policy to do online learning system, a good offline evaluation system will decide the iteration speed of offline modeling.
There are multiple ways to do offline evaluation of reinforcement learning algorithm (off policy evaluation)

Learning To Search

Learning to search is a framework to jointly optimize complex loss functions. There are multiple approaches to this problem:

optimize independently.
Multi-Task learning
Graphical Model

Learning to search is another framework to this problem. The general idea is to treat the problem as reinforcement learning problem.
While the true label will act as the best available policy to guide the learning.

追风人

2019年12月31日星期二

2019总结

2019总结

人生感慨

我的2019

宝宝出生

工作

学习

我的2020

家庭

工作

Machine Learning the future

Key Points to future

Online Learning

Sample Imbalance

Learning Rate

Scale of feature value

Explore/Exploit Tradeoff

Uniformly Randomization Logging for Data Collection

UCB/LinUCB/Bandits

Offline Policy Evaluation

Learning To Search