tag:blogger.com,1999:blog-7345806147365425073.post8353569646472029466..comments2024-02-29T07:31:17.654+01:00Comments on Peekaboo: NIPS 2010 - Single Layer Networks in Unsupervised Feature Learning: The Deep Learning Killer [Edit: now available online!]Andreas Muellerhttp://www.blogger.com/profile/10177962095394942563noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-7345806147365425073.post-39492242757250031782012-11-09T08:45:39.402+01:002012-11-09T08:45:39.402+01:00The Walsh Hadamard transform transforms a point in...The Walsh Hadamard transform transforms a point into a sequency pattern. It is self-inverse therefore it transforms a sequency pattern to a point. The WHT is done using patterns of addition and subtraction (it is very fast). A paper by Wallace shows that the central limit theory applies to the output of a WHT. A fact I independently rediscovered around 2002/2003. I also showed that the WHT can be combined with random permutations to convert arbitrary numerical data into data with a Gaussian distribution. I further created a pure linear algebra neural net based on that. The exact learning capacity is 1 memory for 1 weight vector. However in higher dimensional space any similarity at all between 2 vectors is extremely unusual and basically cannot happen by chance. Hence even when the 1 memory for 1 weight vector limit is exceeded the output of the neural net I created is still very much closer to the target vector than could possibly happen by chance. On that basis I think the neural net I have created should be investigated further.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7345806147365425073.post-31140772026220210792012-11-07T05:05:20.057+01:002012-11-07T05:05:20.057+01:00A couple of years ago I was doing work on the Wals...A couple of years ago I was doing work on the Walsh Hadamard transform. Combining them with random permutations to convert data into the 'Gaussian state'. I formulated a type of neural net based on the concept. I got really bored with it and dropped it for a while. Now maybe I will look at that whole area again. <br />http://www.mediafire.com/file/q4o3clxu1r18ufs/MetalCrystalNN.zip<br />I should say that the code contains some loose ideas that I am playing with. I might decide for or against some of those ideas later.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7345806147365425073.post-8090086988360399582011-01-08T23:43:33.887+01:002011-01-08T23:43:33.887+01:00I think Honglak's take-away message makes a lo...I think Honglak's take-away message makes a lot of sense. It is definitely my impression also that a deep architecture will become more important when dealing with issues on a more real-world scale-- issues that our brain has to deal with. <br /><br />I'm curious about what you think better datasets would look like, where deep methods should clearly have an advantage when done right. I thought about it for a while, and I have an opinion on where things should go, but I think maybe it's too crazy, and I'm not sure if it would be well received :D<br /><br />This is really great discussion though, and I would love to discuss more of this kind of stuff on the website I linked. Maybe if we can continue it there then others can choose to chip in too. It would also be nice to have a more central location for this kind of discussion, instead of having it spread around the internet :( I'll re-pose the question there.Andrej Karpathyhttp://karpathy.canoreply@blogger.comtag:blogger.com,1999:blog-7345806147365425073.post-74966465501741300102011-01-08T10:55:48.182+01:002011-01-08T10:55:48.182+01:00I gave this some more thought, and I'm thinkin...I gave this some more thought, and I'm thinking it might be interesting to try another experiment that uses latent representations from all layers instead of just shallow or deep layers.<br /><br />Each layer represents an abstraction of the original information. I think it's also safe to say each deeper layer is a further level of abstraction.<br /><br />Therefore, each layer provides different insight into the raw data that while captured in the other latent representations is uniquely expressed in each layer.<br /><br />-BrianUnknownhttps://www.blogger.com/profile/12826176571185853082noreply@blogger.comtag:blogger.com,1999:blog-7345806147365425073.post-8778173053059232242011-01-07T15:11:22.143+01:002011-01-07T15:11:22.143+01:00@Andrej:
The title definitely mirrors my reception...@Andrej:<br />The title definitely mirrors my reception of the work, not the intention by the authors. Since they have very popular ongoing work in Deep systems, they definitely don't want to attack them.<br />But still this is rather surprising and also a little disappointing for many deep learning people.<br />I also find the beauty and simplicity of this approach intriguing.<br />The lesson that Honglak Lee took from this work (at least the one that he shared with me) is that the standard problems in the deep community are not that suitable to the deep approaches. I think that is also what Brian has been arguing.<br />It is definitely promising to combine deep architectures with this new feature extraction approach. But it's quite uncertain if they will add much on this kind of data and task.<br /><br />Thanks for charing your site!<br /><br />AndyAndreas Muellerhttps://www.blogger.com/profile/10177962095394942563noreply@blogger.comtag:blogger.com,1999:blog-7345806147365425073.post-82699189109400965362011-01-07T08:37:20.078+01:002011-01-07T08:37:20.078+01:00I didn't see it as an attack, I'm more or ...I didn't see it as an attack, I'm more or less trying to argue that all hope is not lost. I'd much rather have shorter training times with improved success rates.<br /><br />I guess my not-long-winded version is: just because classification [seems to be] better performed with a shallow representation doesn't mean deep learning isn't useful.<br /><br />-BrianUnknownhttps://www.blogger.com/profile/12826176571185853082noreply@blogger.comtag:blogger.com,1999:blog-7345806147365425073.post-60865038570144399242011-01-06T21:06:51.327+01:002011-01-06T21:06:51.327+01:00I did not interpret this work as an attack on the ...I did not interpret this work as an attack on the idea of deep learning. It showed that you can do very well with this kind of architecture even if you don't go deep. However, that doesn't mean that you can't do even better if you find a way to extend it to a deep architecture.<br /><br />I was personally more impressed by the fact these useful features came out from such a beautiful, simple to implement, intuitive, biologically-plausible algorithm that requires almost no parameters, And that the training is extremely fast too, relatively speaking. These facts alone show that this kind of approach has merit.<br /><br />By the way, I would like to point out that I have started a google site + group for people who want to discuss this work and the surrounding issues further. Some of us have also conducted a few experiments on potential ways of extending this work, and shared some of the code. You can find it here: <br /><br />https://sites.google.com/site/kmeanslearningAndrej Karpathyhttp://karpathy.canoreply@blogger.comtag:blogger.com,1999:blog-7345806147365425073.post-79075937597250113352011-01-06T08:45:11.415+01:002011-01-06T08:45:11.415+01:00I don't know that it's quite as big a blow...I don't know that it's quite as big a blow as it sounds.<br /><br />Hinton has stated on numerous occasions that adding a new [appropriately sized] layer is guaranteed to improve a lower bound on the /reconstruction/ error, so I'd have expected there to be more information to be avaialble deeper in the network -- though perhaps much of that information is stored in the weights, and has been abstracted out, which may be the key to why they encountered these results.<br /><br />In the real world, I don't look at a car and conciously or subconciously examine every visible part to verify that it truly is a car. I've learned to recognize the general shape of a car, so if I'm given a vague outline of a car I'll still recognize it as such in spite of the lack of wheels, driver, or other details. In spite of the missing details, I'm still able to perform the classification task.<br /><br />Now, I've wondered for awhile why a deep -vs- shallow (or first layer) representation would make a difference on a classification task. We're not talking about extracting a boatload of information then performing a complex task with it, but rather funneling the extracted information into a far simpler task -- which bin does this piece of data belong?<br /><br />The 'how' is obviously not a simple task; comparatively speaking, I'm thinking on the order of more complex tasks.<br /><br />Using the human brain as an example: given the visual and audio inputs I've received, generate impulses that cause my arms to catch a ball that's been hit to the outfield (or some other complex task).<br /><br />A deep network allows the system to learn representations that build on each-other; eg, instead of neurons that just say "I can recognize blotches in these locations", deeper layer neurons are combining those blotches to form or recognize more complex representations of the world as the model sees it.<br /><br />That isn't a model for "Which bin should I put this picture in?", it's a model for far more complex tasks; it just so happens you can also use it for classification tasks.<br /><br />In his google tech talk, Michael Merzenich (http://www.youtube.com/watch?v=UyPrL0cmJRs&feature=channel) gives an intriguing description of how babies learn and it isn't at all unlike the process deep networks go through to be trained -- though the human brain is likely more appropriately modeled as an RNN than an FFNN.<br /><br />A baby spends a significant amount of time just learning representations for the world they live in, then begin correlating the pieces of information they've learned to parse in order to understand how things in their world interact and relate to eachother (eg, when mommy puts a spoon in my mouth, I will taste something I like).<br /><br />As I understand them, deeper layers give the network the ability to abstract away from the details and learn more complex combinations. I can see why the classification task worked well using a shallow representation, but in a very real sense there's more information available at that layer for the simple fact that the information hasn't been abstracted out by that point yet.<br /><br />-BrianUnknownhttps://www.blogger.com/profile/12826176571185853082noreply@blogger.com