net.trainFcn = 'trainrp'
[net,tr] = train(net,...)
trainrp is a network training function that updates weight and bias values according to the resilient backpropagation algorithm (Rprop).
net.trainFcn = 'trainrp'
[net,tr] = train(net,...)
Training occurs according to trainrp's training parameters, shown here with their default values:
Maximum number of epochs to train
Epochs between displays (NaN for no displays)
Generate command-line output
Show training GUI
Maximum time to train in seconds
Minimum performance gradient
Maximum validation failures
Increment to weight change
Decrement to weight change
Initial weight change
Maximum weight change
You can create a standard network that uses trainrp with feedforwardnet or cascadeforwardnet.
To prepare a custom network to be trained with trainrp,
In either case, calling train with the resulting network trains the network with trainrp.
Here is a problem consisting of inputs p and targets t to be solved with a network.
p = [0 1 2 3 4 5]; t = [0 0 0 1 1 1];
A two-layer feed-forward network with two hidden neurons and this training function is created.
Create and test a network.
net = feedforwardnet(2,'trainrp');
Here the network is trained and retested.
net.trainParam.epochs = 50; net.trainParam.show = 10; net.trainParam.goal = 0.1; net = train(net,p,t); a = net(p)
See help feedforwardnet and help cascadeforwardnet for other examples.
Multilayer networks typically use sigmoid transfer functions in the hidden layers. These functions are often called "squashing" functions, because they compress an infinite input range into a finite output range. Sigmoid functions are characterized by the fact that their slopes must approach zero as the input gets large. This causes a problem when you use steepest descent to train a multilayer network with sigmoid functions, because the gradient can have a very small magnitude and, therefore, cause small changes in the weights and biases, even though the weights and biases are far from their optimal values.
The purpose of the resilient backpropagation (Rprop) training algorithm is to eliminate these harmful effects of the magnitudes of the partial derivatives. Only the sign of the derivative can determine the direction of the weight update; the magnitude of the derivative has no effect on the weight update. The size of the weight change is determined by a separate update value. The update value for each weight and bias is increased by a factor delt_inc whenever the derivative of the performance function with respect to that weight has the same sign for two successive iterations. The update value is decreased by a factor delt_dec whenever the derivative with respect to that weight changes sign from the previous iteration. If the derivative is zero, the update value remains the same. Whenever the weights are oscillating, the weight change is reduced. If the weight continues to change in the same direction for several iterations, the magnitude of the weight change increases. A complete description of the Rprop algorithm is given in [ReBr93].
The following code recreates the previous network and trains it using the Rprop algorithm. The training parameters for trainrp are epochs, show, goal, time, min_grad, max_fail, delt_inc, delt_dec, delta0, and deltamax. The first eight parameters have been previously discussed. The last two are the initial step size and the maximum step size, respectively. The performance of Rprop is not very sensitive to the settings of the training parameters. For the example below, the training parameters are left at the default values:
p = [-1 -1 2 2;0 5 0 5]; t = [-1 -1 1 1]; net = feedforwardnet(3,'trainrp'); net = train(net,p,t); y = net(p)
rprop is generally much faster than the standard steepest descent algorithm. It also has the nice property that it requires only a modest increase in memory requirements. You do need to store the update values for each weight and bias, which is equivalent to storage of the gradient.
trainrp can train any network as long as its weight, net input, and transfer functions have derivative functions.
Backpropagation is used to calculate derivatives of performance perf with respect to the weight and bias variables X. Each variable is adjusted according to the following:
dX = deltaX.*sign(gX);
where the elements of deltaX are all initialized to delta0, and gX is the gradient. At each iteration the elements of deltaX are modified. If an element of gX changes sign from one iteration to the next, then the corresponding element of deltaX is decreased by delta_dec. If an element of gX maintains the same sign from one iteration to the next, then the corresponding element of deltaX is increased by delta_inc. See Riedmiller, Proceedings of the IEEE International Conference on Neural Networks (ICNN), San Francisco, 1993, pp. 586 to 591.
Training stops when any of these conditions occurs:
The maximum number of epochs (repetitions) is reached.
The maximum amount of time is exceeded.
Performance is minimized to the goal.
The performance gradient falls below min_grad.
Validation performance has increased more than max_fail times since the last time it decreased (when using validation).
Riedmiller, Proceedings of the IEEE International Conference on Neural Networks (ICNN), San Francisco, 1993, pp. 586–591