It's now available online! (pdf)
It follows a very simple idea: Compare "shallow" unsupervised feature extraction on image data using classification.
Datasets that are used are NORB and CIFAR 10, two of the most used datasets in the deep learning community.
Filters are learned on image patches and then features are computed in a very simple pyramid over the image.
These are then classified using a linear SVM. The approaches that are compared are:
- soft K-Means
- Sparse Autoencoder
- Sparse RBM
k nearest neighbors of a point. A cross validation was performed to find the best patchsize, number of features and distance between sample points for features.
This does not seem so exciting so far. What is exciting are the results:
Not only does K-Means beat the other two feature extraction techniques. It also advanced well beyond the state of the art in both datasets.
Results are reported for 1600 features for the above mentioned algorithms except for Soft K-Means 4000 which indicates 4000 features.
For example results on CIFAR 10 are as follows:
- K-Means 68,6%
- Mean-Covariance-RBM 71,0%
- Sparse RBM 72,4%
- Sparse Autoencoder 73,4%
- Improved LCC 74,5%
- Soft K-Means 77,9
- Convolutional RBM 78,9% (previous state of the art)
- Soft K-Means 4000 79,6
These results are very surprising to say the least, since a lot of efford went into designing the LCC, Convolutional RBM and the mc-RBM. The latter two are both deep architectures which were quite probably optimized for this dataset and in particular the convolutional RBM comes from the same group as this work.
Other results include that denser sampling is better with features calculated at every position being the best. Also the performance increases with bigger filter sizes.
I talked to Honglak Lee who was at the poster about these results. He agreed that they are a blow to the head for the deep learning community. Carefully designed and trained deep architectures are outperformed by simple, shallow ones.
When I asked him about future directions of deep learning, Lee said that it should focus more on larger images and more complicated datasets.
I am not quite sure how deep architecture will cope with larger images but I am quite sure that deep learning has to switch its application if it wants to compete with other methods. On the other hand there is a lot more competition on realistic image data than on these datasets that were specifically designed for deep learning methods.
I would thank the authors for this great contribution. This is the sanity check of deep methods that has been missing for too long. But sadly they did not pass.