Data Engineering and Modeling 01: predict defaults with inbalanced data

This is an real question with a sample data from internet. We want to predict the defaults from the inbalanced data(default rate is about 0.08%). All the variables are hidden so we need to explore the variables attributes to find some realtion. Also it is inbalanced so we need to apply some sampling methods to balance the data.

Build Recurrent Neural Network from Scratch

The previous blog shows how to build a neural network manualy from scratch in numpy with matrix/vector multiply and add. Although there are many packages can do this easily and quickly with a few lines of scripts, it is still a good idea to understand the logic behind the packages. This part is from a good blog which use an example predicitng the words in the sentence to explain how to build RNN manually. RNN is a little more complicated than the neural network in the previous blog because the current time status and ourput in RNN will depends on the status in the previous time. So the Backpropagation part will be more complicated. I try to give the details in mathematic formula about how to get the gradients recursively in the partial derivatives.

Build Neural Network from Scratch

Use two examples to show how to build the neural network from scratch: define the activation function on each layer, define the lost function, and calculate the partial derivatives using chain rules.