When I was on NIPS last year, I overheard Ruslan Salakhutdinov being asked whether training RBMs is more of an art than a science.
I think this question is answered (at least for the moment) by a great paper by Asja Fischer and Christian Igel that will be presented tomorrow at the ICANN.
They evaluate different training methods for RBMs on toy problems, where the partition function can be evaluated explicitly.
What they find is that after an inital increase, the log-likelyhood of all models diverges. That is except the learn rate schedule or the weight decay parameter are choosen just right.
Since it is impossible to evaluate the true log probability on a "real-world" dataset (see my older post) this means that it seems impossible to know whether divergence occures and to choose the parameters accordingly.
This paper evaluates CD, PCD and fast PCD but does not use parallel tampering (yet).
It would be very interesting to see if parallel tampering might solve this problem.
But on the other hand parallel tampering also has many hyper-parameters.
Maybe I can tell you more tomorrow, after I talked to Asja.