Visualizing gradient descent in action An ineresting project to show how learning rate will affect the converge. If LR is too small, it will converge very slowly. If LR is too big(like 1.01), it will diverge.
How to Develop a Word Embedding Model for Predicting Movie Review Sentiment It is possible that the loaded embedding does not contain all of the words in our chosen vocabulary. As such, when creating the Embedding weight matrix, we need to skip words that do not have a corresponding vector in the loaded GloVe data. Or if the testing data has some out of vocabulary words, you may add a "UNKNOWN" word in the vocabulary and give a unique index for this special word, then you should random initialize the weights of this word.