Posts

Showing posts with the label visualization

Another look at MNIST

Image
I'm a bit obsessed with MNIST. Mainly because I think it should not be used in any papers any more - it is weird for a lot of reasons. When preparing the workshop we held yesterday I noticed one that I wasn't aware of yet: most of the 1-vs-1 subproblems, are really easy! Basically all pairs of numbers can be separated perfectly using a linear classifier! And even you you just do a PCA to two dimensions, they can pretty much still be linearly separated! It doesn't get much easier than that. This makes me even more sceptical about "feature learning" results on this dataset. To illustrate my point, here are all pairwise PCA projections. The image is pretty huge. Otherwise you wouldn't be able to make out individual data points. You can generate it using this very simple gist . There are some classes that are not obviously separated: 3 vs 5, 4 vs 9, 5 vs 8 and 7 vs 9. But keep in mind, this is just a PCA to two dimensions. It doesn't mean that ...

Animating Random Projections of High Dimensional Data

Image
Recently Jake showed some pretty cool videos in his blog . This inspired me to go back to an idea I had some time ago, about visualizing high-dimensional data via random projections. I love to do exploratory data analysis with scikit-learn , using the manifold , decomposition and clustering module. But in the end, I can only look at two (or three) dimensions. And I really like to see what I am doing.