Showing posts from October, 2011

Random Ramblings on ImageNet

After looking trough ImageNet for a little while now, I found some things that I did not really expect. So here are some properties of ImageNet that I found interesting (even though some of them might be obvious). But first, a quick recap on what ImageNet is: It's a hand annotated dataset, consisting of 10 million images with 10 thousand object classes. The images were collected using search engines and flickr. Classes correspond to "synsets" in WordNet. A synset is a collection of semantically equivalent nouns. For example, there is a synset called 'n04037443' (this is the IMID, the image net id), which corresponds to the nouns 'racer, race car, racing car' and is described as 'a fast car that competes in races'. The synsets in WordNet have an additional hierarchical structure, given by a directed graph. Going down the graph goes from more general concepts to more specific concepts. For example 'mammal' is above 'canine' whi

Exploring Image Net in Python

I guess it is a bit late now for starting to work on image net, since the ILSVRC competition just ended. Still, I think this is a very interesting - if not the most interesting - dataset available today in computer vision. So I thought I'd finally take a look. As usual I wrapped up some Python scripts to do the work for me. You can find my little script on github . At the moment, I provides a class to parse the meta data and xml annotations. It can produce bounding box images and you can search for synsets by keywords and browse the tree somewhat. You have to download the dataset and annotations yourself, though. I am still working on the code and additional functionality is likely to be added soon.