ICML 2012 Deep Learning and Unsupervised Feature Extraction Reading List
The ICML2012 accepted papers are officially online.
On twitter, Andrej Kaparthy complained that the list is a bit hard to browse through. I agree and even though this is probably not the nice visualization he had in mind, I felt like having topical reading lists would somehow mitigate this problem.
Here is my reading list on deep learning and unsupervised feature extraction:
On twitter, Andrej Kaparthy complained that the list is a bit hard to browse through. I agree and even though this is probably not the nice visualization he had in mind, I felt like having topical reading lists would somehow mitigate this problem.
Here is my reading list on deep learning and unsupervised feature extraction:
A Generative Process for Contractive Auto-Encoders
– Accepted
Abstract: The contractive
auto-encoder learns a representation of the input data that
captures the local manifold structure around each data point,
through the leading singular vectors of the Jacobian of the
transformation from input to representation. The corresponding
singular values specify how much local variation is plausible
in directions associated with the corresponding singular
vectors, while remaining in a high-density region of the input
space. This paper proposes a procedure for generating samples
that are consistent with the local structure captured by a
contractive auto-encoder. The associated stochastic process
defines a distribution from which one can sample, and which
experimentally appears to converge quickly and mix well
between modes, compared to Restricted Boltzmann Machines and
Deep Belief Networks. The intuitions behind this procedure can
also be used to train the second layer of contraction that
pools lower-level features and learns to be invariant to the
local directions of variation discovered in the first layer.
We show that this can help learn and represent invariances
present in the data and improve classification error.
Building high-level features using large scale unsupervised learning
– Accepted
Abstract: We consider the
challenge of building feature detectors for high-level concepts
from only unlabeled data. For example, we would like to
understand if it is possible to learn a face detector using only
unlabeled images downloaded from the Internet. To answer this
question, we trained a 9-layered locally connected sparse
autoencoder with pooling and local contrast normalization on a
large dataset of images (which has 10 million images, each image
has 200x200 pixels). On contrary to what appears to be a
widely-held negative belief, our experimental results reveal
that it is possible to achieve a face detector via only
unlabeled data. Control experiments show that the feature
detector is robust not only to translation but also to scaling
and 3D rotation. Also via recognition and visualization, we find
that the same network is sensitive to other high-level concepts
such as cat faces and human bodies.
Evaluating Bayesian and L1 Approaches for Sparse Unsupervised Learning
– Accepted
Abstract: The use of L_1
regularisation for sparse learning has generated immense
research interest, with many successful applications in diverse
areas such as signal acquisition, image coding, genomics and
collaborative filtering. While existing work highlights the many
advantages of L_1 methods, in this paper we find that L_1
regularisation often dramatically under-performs in terms of
predictive performance when compared with other methods for
inferring sparsity. We focus on unsupervised latent variable
models, and develop L_1 minimising factor models,
Bayesian variants of “L_1”, and Bayesian models with a
stronger L_0-like sparsity induced through
spike-and-slab distributions. These spike-and-slab Bayesian
factor models encourage sparsity while accounting for
uncertainty in a principled manner, and avoid unnecessary
shrinkage of non-zero values. We demonstrate on a number of data
sets that in practice spike-and-slab Bayesian methods outperform
L_1 minimisation, even on a computational budget. We thus
highlight the need to re-assess the wide use of L_1
methods in sparsity-reliant applications, particularly when we
care about generalising to previously unseen data, and provide
an alternative that, over many varying conditions, provides
improved generalisation performance.
On multi-view feature learning
– Accepted
Abstract: Sparse coding
is a common approach to learning local features for object
recognition. Recently, there has been an increasing interest
in learning features from spatio-temporal, binocular, or other
multi-observation data, where the goal is to encode the
relationship between images rather than the content of a
single image. We discuss the role of multiplicative
interactions and of squaring non-linearities in learning such
relations. In particular, we show that training a sparse
coding model whose filter responses are multiplied or squared
amounts to jointly diagonalizing a set of matrices that encode
image transformations. Inference amounts to detecting
rotations in the shared eigenspaces. Our analysis helps
explain recent experimental results showing that Fourier
features and circular Fourier features emerge when training
complex cell models on translating or rotating images. It also
shows how learning about transformations makes it possible to
learn invariant features.
Deep Mixtures of Factor Analysers
– Accepted
Abstract: An efficient
way to learn deep density models that have many layers of
latent variables is to learn one layer at a time using a
model that has only one layer of latent variables. After
learning each layer, samples from the posterior
distributions for that layer are used as training data for
learning the next layer. This approach is commonly used with
Restricted Boltzmann Machines, which are undirected
graphical models with a single hidden layer, but it can also
be used with Mixtures of Factor Analysers (MFAs) which are directed
graphical models. In this paper, we present a greedy
layer-wise learning algorithm for Deep Mixtures of Factor
Analysers (DMFAs). Even though a DMFA can be converted to an
equivalent shallow MFA by multiplying together the factor
loading matrices at different levels, learning and inference
are much more efficient in a DMFA and the sharing of each
lower-level factor loading matrix by many different higher
level MFAs prevents overfitting. We demonstrate empirically
that DMFAs learn better density models than both MFAs and
two types of Restricted Boltzmann Machines on a wide variety
of datasets.
Learning Local Transformation Invariance with Restricted Boltzmann Machines
– Accepted
Abstract: The difficulty
of developing feature learning algorithms that are robust
to the novel transformations (e.g., scale, rotation, or
translation) has been a challenge in many applications
(e.g., object recognition problems). In this paper, we
address this important problem of transformation invariant
feature learning by introducing the transformation
matrices into the energy function of the restricted
Boltzmann machines. Specifically, the proposed
transformation-invariant restricted Boltzmann machines not
only learn the diverse patterns by explicitly transforming
the weight matrix, but it also achieves the invariance of
the feature representation via probabilistic max pooling
of hidden units over the set of transformations.
Furthermore, we show that our transformation-invariant
feature learning framework is not limited to this specific
algorithm, but can be also extended to many unsupervised
learning methods, such as an autoencoder or sparse coding.
To validate, we evaluate our algorithm on several
benchmark image databases such as MNIST variation,
CIFAR-10, and STL-10 as well as the customized digit
datasets with significant transformations, and show very
competitive classification performance to the
state-of-the-art. Besides the image data, we apply the
method for phone classification tasks on TIMIT database to
show the wide applicability of our proposed algorithms to
other domains, achieving state-of-the-art performance.
Large-Scale Feature Learning With Spike-and-Slab Sparse Coding
– Accepted
Abstract: We
consider the problem of object recogni- tion with a
large number of classes. In or- der to scale existing
feature learning algo- rithms to this setting, we
introduce a new feature learning and extraction
procedure based on a factor model we call spike-and-
slab sparse coding (S3C). Prior work on this model has
not prioritized the ability to ex- ploit parallel
architectures and scale to the enormous problem sizes
needed for object recognition. We present an inference
proce- dure appropriate for use with GPUs which allows
us to dramatically increase both the training set size
and the amount of latent factors. We demonstrate that
this approach improves upon the supervised learning ca-
pabilities of both sparse coding and the ss- RBM on the
CIFAR-10 dataset. We use the CIFAR-100 dataset to
demonstrate that our method scales to large numbers of
classes bet- ter than previous methods. Finally, we use
our method to win the NIPS 2011 Workshop on Challenges
In Learning Hierarchical Mod- els’ Transfer Learning
Challenge.
Deep Lambertian Networks
– Accepted
Abstract: Visual
perception is a challenging problem in part due to
illumination variations. A possible solution is to
first estimate an illumination invariant
representation before using it for recognition. The
object albedo and surface normals are examples of such
representation. In this paper, we introduce a
multilayer generative model where the latent variables
include the albedo, surface normals, and the light
source. Combining Deep Belief Nets with the Lambertian
reflectance assumption, our model can learn good
priors over the albedo from 2D images. Illumination
variations can be explained by changing only the
lighting latent variable in our model. By transferring
learned knowledge from similar objects, albedo and
surface normals estimation from a single image
is possible in our model. Experiments demonstrate that
our model is able to generalize as well as improve
over standard baselines in one-shot face
recognition.
Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers
– Accepted
Abstract: Scene parsing
consists in labeling each pixel in an image with the category of the
object it belongs to. We propose a method that uses a multiscale
convolutional network trained from raw pixels to extract dense feature
vectors that encode regions of multiple sizes centered on each pixel.
The method alleviates the need for engineered features. In parallel to
feature extraction, a tree of segments is computed from a graph of pixel
dissimilarities. The feature vectors associated with the segments
covered by each node in the tree are aggregated and fed to a classifier
which produces an estimate of the distribution of object categories
contained in the segment. A subset of tree nodes that cover the image
are then selected so as to maximize the average 'purity' of the class
distributions, hence maximizing the overall likelihood that each segment
will contain a single object. The system yields record accuracies on
the the Sift Flow Dataset (33 classes) and the Barcelona Dataset (170
classes) and near-record accuracy on Stanford Background Dataset (8
classes), while being an order of magnitude faster than competing
approaches, producing a 320x240 image labeling in less than 1 second,
including feature extraction.
Comments
Post a Comment