pydata: Huiming's learning notes

Keep Looking, Don't Settle

动态规划

很早以前学运筹学的时候学过动态规划。最近看题目的时候看到这个名字,借助zhihu上的一篇文章,重新温习一下。

Data Engineering and Modeling 01: predict defaults with imbalanced data

This is an real question with a sample data from internet. We want to predict the defaults from the imbalanced data(default rate is about 0.08%). All the variables are hidden so we need to explore the variables attributes to find some realtion. Also it is imbalanced so we need to apply some sampling methods to balance the data.

Build Recurrent Neural Network from Scratch

The previous blog shows how to build a neural network manualy from scratch in numpy with matrix/vector multiply and add. Although there are many packages can do this easily and quickly with a few lines of scripts, it is still a good idea to understand the logic behind the packages. This part is from a good blog which use an example predicitng the words in the sentence to explain how to build RNN manually. RNN is a little more complicated than the neural network in the previous blog because the current time status and ourput in RNN will depends on the status in the previous time. So the Backpropagation part will be more complicated. I try to give the details in mathematic formula about how to get the gradients recursively in the partial derivatives.