Saturday, December 11, 2010

NIPS 2010 - How to Grow a Mind: Statistics, Structure and Abstraction [updated]

Two days ago, Josh Tenenbaum gave a very inspiring talk about how to "learn learning", about abstraction and structure in learning.
In my opinion, it has been one of the best talks of this conference. It gave a broad overview over some new conceptual ideas and some experiments building upon these ideas.
Josh Tenenbaum is a great speaker and I am sure I can not live up to giving a good summary of his talk. Nevertheless I will try and give a short summary of his ideas and some pointers to his work.
I very much recommend you see this talk as a video lecture and I am sure I will see it again.

The basic question that Tenenbaum asked is "How does the mind get so much from so little."
In my words he is asking: How can we generalize from the little and very noisy data that we get?
His main interest in this is from a cognitive science perspective but of course his ideas are also very applicable to machine learning. He formulates his approach as reverse engineering human perception: If we want a model of the human mind, we have to build a model that can solve all problems that humans can solve.

As a motivating example he showed an excerpt from this video from a Heider-Simmel demonstration of 1944.
It is a flatland-style scenario consisting of moving triangles and a circle. But clearly these objects seem to be alive and interacting. One even perceives emotions in this very simple geometric forms.
Where does that come from?
In particular Tenenbaum formulated the following questions:

  • How does abstract knowledge guide learning and inference from sparse data?
  • What form does abstract knowledge take across different domains and tasks?
  • How is abstract knowledge itself acquired while balancing complexity versus fit?

Tenenbaum showed some experiments in which people where given certain tasks and performance was compare qualitatively and quantitatively with predictions made by Bayesian models.
One of the tasks was to judge a quantity from a quite uninformative sentence like:
"A women is 87 years old. How long will she live?"
In these tasks, the Bayesian model performed nearly exactly like the human subjects.
I understand this as a sort of justification for using Bayesian methods as a model of human reasoning.

As a possible hint to how abstract representations are learned, Tenenbaum quoted the following facts:
In infants, the ability to discriminate between solid objects using shape is present at about 2 years. The ability to discriminate non solids using texture arises at about 3 years. And at about 4 years, children are able to understand tree structured categories - like a dog and a cat both being animals.
This points in several directions. One is, that different measures of similarity are used for different classes. On the other hand, learning tree structures is somewhat hard but helps a lot in understanding the world.
But not only tree structures are important. Tenenbaum points out that humans have the possibility to infer the underlying form of a domain just from the data. As an example he named the periodic table of elements, the taxonomy of animal species and others.
He pointed out that at the moment there are several ways to infer the structure of data for example by hierarchical clustering, mixture models or manifolds. But there is no way to discern the form of the structure we are looking for at the moment.
This puts us in the need of an unsupervised learner that can infer this form, not only the structure.
As a possible solution he mentions his work on a graph grammar. Together with Kemp he showed in a 2008 PNAS paper that it possible to recover ring, tree or line forms from a given dataset.
This leads to the idea of "Learning the big picture first" or maybe "Learning the high level first" by starting with a coarse global structure and refining it later.
This is in contrast to the bottom up approach that is usually used in deep learning architectures.
Tenenbaum sees this kind of form learning as a limit of graphical models, were the form is fixed. He sees the need to create some form of "Probabilistic Programming" that infers not only the statistics but also the underlying data structures.
This ended the talk but there was an interesting discussion of Geoff Hinton and Josh Tenenbaum afterwards that I found very interesting.

Hinton asked why Tenenbaum models structure explicitly in trees and other graphs instead of letting a deep network learn these things by itself.
Tenenbaum reply was, that this might be possible but for the moment it is necessary to use more explicit structures since we do not know how such a model could be learned.

This was by far not all that Tenenbaum talked about but these were the main points that interested me the most. I have just written this down from some notes and memory but I hope I have not distorted Tenenbaums ideas to much.

Oh and there was one more thing: Tenenbaum announced a new dataset, called "MNIST++" which consists of handwritten characters from 50 alphabets with 10 examples per character.
The idea behind this is to have a "one shot learning" database with many different classes which all share a common structure. The dataset was collected by showing "clean" versions of the characters to people (I think turkers) who had to reproduce it with the mouse. Tenenbaum and Russlan Salakhutdinov already started working on this dataset. Some of the methods work on the pixel level, others use stroke features to have a more meaningful representation of this kind of input.
I guess the idea behind this is to not think about features but to find a way to learn structure. This is definitely an interesting direction and I am excited to see what will come of this.

No comments:

Post a Comment