Showing posts from December, 2010

"Single Layer Networks in Unsupervised Feature Learning" online!

The paper "Single Layer Networks in Unsupervised Feature Learning" by Coates, Lee, Ng, that I talked about in this post , is now available online ( pdf )! Thanks to Olivier Grisel from metasploit for pointing that out.

NIPS 2010 - Transfer learning workshop

Ok this is probably my last post about NIPS 2010. First of all, I became a big fan of Zoubin Ghahramani . He is a great speaker and quite funny. There are quite some video lecture by him that are linked on his personal page: here and here . They are mostly about graphical models and nonparametric methods. He had an invited talk at the transfer learning workshop about cascading indian buffet process where he illustrated the idea behind this method: "Every dish is a customer in another restaurant. Somebody pointed out that this is kind of canabilistic. We didn't realize that the IBP analogy goes really deep.... dark ... and wrong." This work is about learning the structure of directed graphical models using IBP priors on the graph structure ( pdf ). When asked about three way interaction, which this model does not feature - in contrast to many deep graphical models studied at the moment - he argued that latent variables induce covariances by marginalization on the lay

NIPS 2010 - More Highlights

This post is about all the other papers on NIPS that I found particularly interesting but don't have the time to write a lot about. There is a similar post at Yaroslav Bulato's blog . Multiple Kernel Learning and the SMO Algorithm by Vishwanathan, Sun, Ampornput and Varma ( pdf ) . Code here . Efficient training of p -norm MKL using Sequential Minimal Optimization. Kernel Descriptors for Visual Recognition by Liefeng Bo, Xiaofeng Ren, Dieter Fox ( pdf ) A general setting to design image patch descriptors using kernels. The proposed kernel is demonstrated to outperform SIFT. A Theory of Multiclass Boosting   by Indraneel Mukherjee, Robert Schapire ( pdf ) Title says it all. Deep Coding Network   by Yuanqing Lin, Zhang Tong, Shenghuo Zhu, Kai Yu ( pdf ) This is a continuation of the work on Linear Coordinate Coding ( pdf ) which won the image net callenge . Tree-Structured Stick Breaking for Hierarchical Data   by Ryan Adams, Zoubin Ghahramani, Michael Jordan ( pdf

NIPS 2010 - Thinking dynamically

Apart from the presentations and posters, there is another great thing about NIPS: you can discuss machine learning with great researcher in person. One of the people I talked to quite a lot is Jascha Sohl-Dickstein . We discussed some deep methods and training procedures at some length and he is an amazing person with a lot of energy and new ideas. He recently wrote two papers that I quite liked: Minimum Probability Flow Learning and An Unsupervised Algorithm For Learning Lie Group Transformations . I like both of them for their quite unusual point of view. Jascha has a background in physics and his point of view focuses a lot on understanding the dynamics of learning and transformations. It think "Minimum Probability Flow Learning" gives new insights into training probabilistic models and as far as I know it is used quite successfully for training Ising models. Both works are not published yet but I find they are quite worth reading and so I'd like to draw a lit

NIPS 2010 - Deep Learning Workshop

There was an interesting talk by Jitendra Malik about "Rich Representations for Learning Visual Recognition" and thereafter a panel discussion with Jitendra Malik, Yann LeCun, Geoff Hinton, Tomaso Poggio, Kai Yu, Yoshua Bengio and Andrew Ng. Many "deep" topics were touched but there is one or two ideas that I found the most noteworthy. The first is the idea by Malik to do "hyper supervision". This is his idea of doing the exact opposite than weak supervision: The training examples are labeled very precisely and with lots of extra information. This makes it possible to find more interesting intermediate representations. It also gives the learning algorithm more to work on. In his introduction he said: "Learning object recognition from bounding boxes is like learning language from a list of sentences." If I understand his ideas correctly, he thinks that is its necessary to have additional clues - like 3D information, tracking and time consist

NIPS 2010 - Investigating Convergence of Restricted Boltzmann Machine Learning

My colleague Hannes Schulz and me also had a paper in this years NIPS deep learning workshop : “ Investigating Convergence of Restricted Boltzmann Machine Learning “. It is about evaluation of RBM training. One problem one faces when training a RBM is that it is usually not possible to evaluate the actual objective function. This makes it hard to evaluate the training and find the right hyperparameters. This problem is even more severe since contrastive divergence and persistent contrastive divergence learning, which are the most popular learning algorithms for RBMs are know to diverge if the hyperparamters are not tuned well. In our work we train a small RBM for which we can compute the partition function and evaluate the objective function exactly. We trained RBMs with a minimum number of hyper parameters and computed exact learning curves. We confirm the divergence of the algorithms in some cases and we also confirm that the reconstruction error is not a good measure of performa

NIPS 2010 - Single Layer Networks in Unsupervised Feature Learning: The Deep Learning Killer [Edit: now available online!]

The paper "Single Layer Networks in Unsupervised Feature Learning" by Coates, Lee, Ng is one of the most interesting on this years NIPS in my opinion. [edit] It's now available online! ( pdf ) [/edit] It follows a very simple idea: Compare "shallow" unsupervised feature extraction on image data using classification. Datasets that are used are NORB and CIFAR 10, two of the most used datasets in the deep learning community. Filters are learned on image patches and then features are computed in a very simple pyramid over the image. These are then classified using a linear SVM. The approaches that are compared are: K-Means soft K-Means Sparse Autoencoder Sparse RBM Here soft K-Means is an ad-hoc method that the authors thought up as being a natural extension of K-Means. It is a local coding based on the k nearest neighbors of a point. A cross validation was performed to find the best patchsize, number of features and distance between sample points for f

NIPS 2010 - My favourite quotes [updated]

Here are some of the quotes of this conference that were more on the lighter side and that made me at least chuckle: Josh Tenenbaum about 1970s linguists: "They didn't have any computers these days. At least none that we would now recognize as computer." Geoff Hinton on Josh Tenenbaums talk about "How to grow a mind": "You made the right generalization of Deep Learning from one example." (Hinting at Josh Tenenbaums interest in one-shot-learning) Some more of Geoff Hinton: On optimizing a non-convex likelihood function: "In good neural network fashion we add noise and momentum and hope for the best." On the same topic: "Now we allow the propabilities to add up to 4. I accidentally forgot to normalize them and they were 2. When I normalized them to one, the result got worse. So I went in the other direction." David W. Hogg in the Sam Roweis symposium: "In astronomy, we work at the photon level. We don't have big b

NIPS 2010 - How to Grow a Mind: Statistics, Structure and Abstraction [updated]

Two days ago, Josh Tenenbaum gave a very inspiring talk about how to "learn learning", about abstraction and structure in learning. In my opinion, it has been one of the best talks of this conference. It gave a broad overview over some new conceptual ideas and some experiments building upon these ideas. Josh Tenenbaum is a great speaker and I am sure I can not live up to giving a good summary of his talk. Nevertheless I will try and give a short summary of his ideas and some pointers to his work. I very much recommend you see this talk as a video lecture and I am sure I will see it again. The basic question that Tenenbaum asked is "How does the mind get so much from so little." In my words he is asking: How can we generalize from the little and very noisy data that we get? His main interest in this is from a cognitive science perspective but of course his ideas are also very applicable to machine learning. He formulates his approach as reverse engineering hu

NIPS 2010 - Perceptual Bases for Rules of Thumb in Photography

The first invited talk today was Martin Banks talking about "Perceptual Bases for Rules of Thumb in Photography" . It was a psychological talk, it does not directly have any thing to do with machine learning or computer vision but it was nevertheless very interesting - even more though as I am a hobby photographer . The overall theme was how people perceive photographs and pictures on screens and what are the geometric and psychological sources of some effects. In particular he focused on three topics: Wide angle distortion Depth compression and expansion Depth of field effects The first part about wide angle distortion is about the effect that images that are captured with wide angles seem distorted. This is the reason why in portrait and beauty photography usually a long focal lens is used. From a projective geometry point of view, it is easy to see why this happens - and it is geometrically correct. But we still perceive it as "wrong". The question is: Why

NIPS 2010 - Label Embedding Trees for Large Multi-Class Tasks [edit]

Label Embedding Trees for Large Multi-Class Tasks by Samy Bengio, Jason Weston, David Gran is a paper about coping with classification in the presents huge amounts of data and many many classes. Of course the image net challenge comes to mind. But what they actually did is doing classification on all of image net , which is about 16.000 classes and over a million instances. This work focuses on very fast recall but is very expensive to train - apparently the model was trained in a Google cluster. This paper explores two ideas: hierarchical classification and low dimensional embeddings. For the hierarchical classification part, a tree of label sets is build where the leaves are single classes. Starting from all classes at the root, the set of classes is split into subsets similar classes which are represented by new nodes. This is done by training a one-vs-rest classifier and inspecting the confusion matrix. Classes that are easy to confuse are put into the same node. At test-

NIPS 2010 - How to get your paper accepted to NIPS

NIPS 2010 has been going on for half a week already and finally I have the time to write about it. So many inspiring posters and talks and so many great people to talk to. I'll start by giving you this years top 9 tips to get your paper accepted to NIPS: Write a strong abstract. A paper with the following abstract got rejected: "This paper is great." Hide the fact that the paper is recycled. One paper was submitted that was obviously resubmitted from ICML. How do we know? It had additional to the line numbers in the new NIPS layout also (and overlapping) the ICML line numbers on both sides. Carefully maintain anonymity. One of the reviewers remarked that he found the authors name in the pdf properties of a submission. Avoid certain reviewers - apparently some are quite harsh. Sadly I don't have the review in question but since you can't be sure to avoid the reviewer, it's better if you don't know it. Get your bid in early. Someone asked for a t