NIPS 2010 - Investigating Convergence of Restricted Boltzmann Machine Learning
My colleague Hannes Schulz and me also had a paper in this years NIPS deep learning workshop: “Investigating Convergence of Restricted Boltzmann Machine Learning“. It is about evaluation of RBM training.
One problem one faces when training a RBM is that it is usually not possible to evaluate the actual objective function. This makes it hard
to evaluate the training and find the right hyperparameters. This problem is even more severe since contrastive divergence and persistent contrastive divergence learning, which are the most popular learning algorithms for RBMs are know to diverge if the hyperparamters are not tuned well.
In our work we train a small RBM for which we can compute the partition function and evaluate the objective function exactly.
We trained RBMs with a minimum number of hyper parameters and computed exact learning curves. We confirm the divergence of the algorithms in some cases and we also confirm that the reconstruction error is not a good measure of performance.
We explore annealed importance sampling as suggested by Russlan Salakhutdinov as a method to approximate the partition function to evaluate the model. We find that this works quite often remarkably well but completely breaks down in some instances.
This means that AIS is not a reliable indicator of divergence and to find good parameters.
We can currently not provide an alternative to AIS but we are working on explaining the break down and hopefully find a useful measure of RBM performance.
One problem one faces when training a RBM is that it is usually not possible to evaluate the actual objective function. This makes it hard
to evaluate the training and find the right hyperparameters. This problem is even more severe since contrastive divergence and persistent contrastive divergence learning, which are the most popular learning algorithms for RBMs are know to diverge if the hyperparamters are not tuned well.
In our work we train a small RBM for which we can compute the partition function and evaluate the objective function exactly.
We trained RBMs with a minimum number of hyper parameters and computed exact learning curves. We confirm the divergence of the algorithms in some cases and we also confirm that the reconstruction error is not a good measure of performance.
We explore annealed importance sampling as suggested by Russlan Salakhutdinov as a method to approximate the partition function to evaluate the model. We find that this works quite often remarkably well but completely breaks down in some instances.
This means that AIS is not a reliable indicator of divergence and to find good parameters.
We can currently not provide an alternative to AIS but we are working on explaining the break down and hopefully find a useful measure of RBM performance.
Comments
Post a Comment