[CVML] The Ikea Problem
In one of the lectures at the CVML summer school, Josef Sivic proposed the IKEA problem: Finding all Ikea furniture in all of YouTube.
Seems like a pretty cool task to me. And not completely unrealistic.
It is a bit more challenging as it seems, though:
There is the obvious large-scale aspect of trying to analize all YouTube videos -
and with it come the restrictions such as the need for linear classifiers and
efficient features. But another challenge are the objects themselves:
Much of the Ikea furniture has no real texture - at least not on an interesting scale.
A kitchen Table from ikea might have some small wood structure. But basically its
a rectangle with 4 little straigt legs. From very varying viewpoints.
This calls for a shape descriptor. The only one I am aware of that is currently used is Hog. But by nature hog is far from viewpoint invariant - it is not supposed to be.
Fulkerson uses a mixture of Hogs for cars on streets - a task with arguably less variation in viewpoint than tables in YouTube.
So any suggestions?
Oh and there is also billy. No texture in itself but nearly completely occluded.
There are "book shelf" classes in CV datasets. But this is completely different.
Not all billys contain books. And not all bookshelfs are billys.
If any one has any idea on how to detect billys, please let me know ;)
Seems like a pretty cool task to me. And not completely unrealistic.
It is a bit more challenging as it seems, though:
There is the obvious large-scale aspect of trying to analize all YouTube videos -
and with it come the restrictions such as the need for linear classifiers and
efficient features. But another challenge are the objects themselves:
Much of the Ikea furniture has no real texture - at least not on an interesting scale.
A kitchen Table from ikea might have some small wood structure. But basically its
a rectangle with 4 little straigt legs. From very varying viewpoints.
This calls for a shape descriptor. The only one I am aware of that is currently used is Hog. But by nature hog is far from viewpoint invariant - it is not supposed to be.
Fulkerson uses a mixture of Hogs for cars on streets - a task with arguably less variation in viewpoint than tables in YouTube.
So any suggestions?
Oh and there is also billy. No texture in itself but nearly completely occluded.
There are "book shelf" classes in CV datasets. But this is completely different.
Not all billys contain books. And not all bookshelfs are billys.
If any one has any idea on how to detect billys, please let me know ;)
Another alternative to HOGs: http://cvlab.epfl.ch/~lepetit/papers/hinterstoisser_cvpr10.pdf
ReplyDeleteIt is more like template matching, so does not need so much texture as HOG, but totally irresistible to occlusions.