Posts

Showing posts from July, 2011

[CVML] Martial Hebert: Using Geometric information in reconition and scene analysis

The last talk of the CVML summer school (yes, I'm starting from the back ;) was by Martial Hebert about using scene geometry for recognition and segmentation. His work focuses mainly on pictures of man-made environment but is not exclusive to it. Heberts talk was two hours long and spanned many of his past and recent work which I can not all repeat here. I will focus on some main messages and things I got from his presentation. One of the first works Hebert talked about was classifying regions of an image into "surface orientation"  categories. These categories are roughly "ground plane", "sky", "vertical facing right", "vertical facing left", "vertical facing camera". I had seen several works in this direction but never found them to be very interesting. "Why this task?" is what I was always asking myself - I never quite understood the motivation. In his talk, Hebert made the motivation very clear: This i

[CVML] The Ikea Problem

In one of the lectures at the CVML summer school, Josef Sivic proposed the IKEA problem: Finding all Ikea furniture in all of YouTube. Seems like a pretty cool task to me. And not completely unrealistic. It is a bit more challenging as it seems, though: There is the obvious large-scale aspect of trying to analize all YouTube videos - and with it come the restrictions such as the need for linear classifiers and efficient features. But another challenge are the objects themselves: Much of the Ikea furniture has no real texture - at least not on an interesting scale. A kitchen Table from ikea might have some small wood structure. But basically its a rectangle with 4 little straigt legs. From very varying viewpoints. This calls for a shape descriptor. The only one I am aware of that is currently used is Hog. But by nature hog is far from viewpoint invariant - it is not supposed to be. Fulkerson uses a mixture of Hogs for cars on streets - a task with arguably less variation in vi

[CVML] Random Facts and Advice

Some tips and facts that I took from the summer school. They are pretty random but may be usefull for pactitioners of vision. Many may seem obvious - but I just didn't see it before ... Gist doesn't work on cropped or rotated images. Since it does a kind of whole image template matching, this is pretty clear. And maybe it shouldn't - the scene layout is changed after all. For doing BoW Cordelia Schmid suggests (and I guess uses) 6x6 patches and a pyramid with scale factor 1.2. Scaling is done by Gaussian convolution. Ponce uses 10x10 patches to do sparse coding. If you combine multiple features using MKL or by just adding up kernels (which is the same as concatenating features), normalize each feature by it's variance and then search for a joint gamma. This heuristic get's you out of doing grid search over a huge space! Cordelia Schmid thinks that "clever clusters" don't help much in doing BoW. She thinks it's more important to work

CVML 2011 Posters

There were many posters on the CVML summer school and I won't talk about all of them. Actually 8 of them got prizes (in form of hand-signed CV and ML books). I knew some of the work form NIPS2010 but there were some things that were new to me: Alexander Vezhnevets presented work on Multi Image Model for Semantic Segmentation with Different Levels of Supervision. I don't know how I could miss that before. This is amazing work on weakly supervised semantic scene segmentation on MSRC. It makes use of CRFs, boosted texton forests and superpixels. The CRF does not only connect neighbouring superpixels but also superpixels in different images that look similar. Super pixel labels are treated as latent variables and only a very simple contraint between image label and superpixel label is enforced. Since I am looking at a very similar task at the moment, even though the other posters were very good, this one was definitely the best for me. Yang Hua presented work on Contextual

CVML summer school 2011

Thanks to my institute and the B-IT , I can attend the CVML Summer school organised by ENS and INRIA in Pairs. It started on Monday and features many great speakers, for example Jitendra Malik, Cordelia Schmid, Andrew Zisserman, Jean Ponce and may others. I was pretty busy with the program until now but I hope I'll find some time to write about all the great lectures here. The lectures at the beginning of the week were about standard topics like Francis Bach's tutorial on SVMs and kernel methods and Cordelia Schmid's and Josef Sivic's introduction to interest points, features and visual words. During the week talks ranged from details about current state of the art in object recognition and large scale learning to bigger picture talks and directions for future research. Many of the professors put their slides online. I definitely suggest to have a look at those. Hopefully I have the time to write about all of the talks but I doubt it a little.