Neural network have become a corner stone of machine learning in the last decade. Created in the late 1940s with the intention to create computer programs who mimics the way neurons process information, those kinds of algorithm have long been believe to be only an academic curiosity, deprived of practical use since they require a lot of processing power and other machine learning algorithm outperform them. However since the mid 2000s, the creation of new neural network types and techniques, couple with the increase availability of fast computers made the neural network a powerful tool that every data analysts or programmer must know.
In this series of articles, we’ll see how to fit a neural network with R, we’ll learn the core concepts we need to know to well apply those algorithms and how to evaluate if our model is appropriate to use in production. In the last exercises sets, we have seen how to implement a feed-forward neural network in R. That kind of neural network is quite useful to match a single input value to a specific output value, either a dependent variable in regression problems or a class in clustering problems. However sometime, a sequence of input can give a lot more of information to the network than a single value. For example, if you want to train a neural network to predict which letter will come next in a word based on which letters have been typed, making prediction based on the last letter entered can give good results, but if all the previous letter are used for making the predictions the results should be better since the arrangement of previous letter can give important information about the rest of the word.
In today’s exercise set, we will see a type of neural network that is design to make use of the information made available by using sequence of inputs. Those ”recurrent neural networks” do so by using a hidden state at time t-1 that influence the calculation of the weight at time t. For more information about this type of neural network, you can read this article which is a good introduction on the subject.
Answers to the exercises are available here.
We will start by using a recurrent neural network to predict the values of a time series. Load the
tsEuStockMarkets dataset from the
dataset package and save the first 1400 observations from the “DAX” time series as your working dataset.
Process the dataset so he can be used in a neural network.
Create two matrix containing 10 sequences of 140 observations from the previous dataset. The first one must be made of the original observations and will be the input of our neural network. The second one will be the output and since we want to predict the value of the stock market at time t+1 based on the value at time t, this matrix will be the same as the first one were all the elements are shifted from one position. Make sure that each sequence are coded as a row of each matrix.
Set the seed to 42 and choose randomly eight sequences to train your model and two sequences that will be used for validation later. Once it’s done, load the
rnn package and use the
trainr() function to train a recurrent neural network on the training dataset. For now, use a learning rate of 0.01, one hidden layer of one neuron and 500 epoch.
Use the function
predictr to make prediction on all the 10 sequences of your original data matrix, then plot the real values and the predicted value on the same graph. Also draw the plot of the prediction on the test set and the real value of your dataset.
The last model seems to underestimate the stock values that are higher than 0.5. Repeat the step of exercise 3 and 4 but this time use 10 hidden layers. Once it’s done calculate the RMSE of your predictions. This will be the baseline model for the rest of this exercise set.
One interesting method often used to accelerate the training of a neural network is the “Nesterov momentum”. This procedure is based on the fact that while trying to find the weights that minimize the cost function of your neural network, optimization algorithm like gradient descend “zigzag” around a straight path to the minimum value. By adding a momentum matrix, which keeps track of the general direction of the gradient, to the gradient we can minimize the deviation from this optimal path and speeding the convergence of the algorithm. You can see this video for more information about this concept.
Repeat the last exercise, but this time use 250 epochs and a momentum of 0.7.
As special type of recurrent neural network trained by backpropagation through time is called the Long Short-Term Memory (LSTM) network. This type of recurrent neural network is quite useful in a deep learning context, since this method is robust again the vanishing gradient problem. We will see both of those concepts more in detail in a future exercise set, but for now you can read about it here.
trainr() function give us the ability to train a LSTM network by setting the
network_type parameter to “lstm”. Use this algorithm with 500 epochs and 20 neuron in the hidden layer to predict the value of your time series.
When working with a recurrent neural network it is important to choose an input sequence length that give the algorithm the maximum information possible without adding useless noise to the input. Until now we used 10 sequences of 140 observations. Train a recurrent neural network on 28 sequences of 50 observations, make prediction and compute the RMSE to see if this encoding had an effect on your predictions.
Try to use all of the 1860 observation in the “DAX” time series to train and test a recurrent neural network. Then post the setting you used for your model and why you choose them in the comments.