Update for structured SVM in Python
I just pushed an update for my structured SVM in Python.
This contains a bugfix in the dual formulation and a subgradient descent version of the structured SVM.
The new version now has some options to be really verbose, track the slacks and constraints of all examples and measure the primal objective.
Also, quite handy for approximate inference, it complains when the slack of the "most violated constraint" is smaller than the slacks of the previous constraints.
The code is not as pretty as it could be but I'm still hacking at it all day.
I hope it is still usable.
A quick word about the bug that I fixed:
In the usual SVM, all the dual variables alpha are constraint to be within
0 < alpha < C. In the structural SVM, several dual variables correspond to the same example and share a common slack variable. Therefore, the sum of all alpha that correspond to a single example is bound by C.
Took me 3 days to find :-/
A good read on subgradient methods for structured prediction is "(Online) Subgradient Methods for Structured Prediction" by Ratliff et. al.
I found the learning rate to be a bit hard to tune, though. I guess this method is a gain better for many examples. Please note that I implemented a batch version, not an online version. Though this is trivial to change.
Any comments would be very welcome :)
This contains a bugfix in the dual formulation and a subgradient descent version of the structured SVM.
The new version now has some options to be really verbose, track the slacks and constraints of all examples and measure the primal objective.
Also, quite handy for approximate inference, it complains when the slack of the "most violated constraint" is smaller than the slacks of the previous constraints.
The code is not as pretty as it could be but I'm still hacking at it all day.
I hope it is still usable.
A quick word about the bug that I fixed:
In the usual SVM, all the dual variables alpha are constraint to be within
0 < alpha < C. In the structural SVM, several dual variables correspond to the same example and share a common slack variable. Therefore, the sum of all alpha that correspond to a single example is bound by C.
Took me 3 days to find :-/
A good read on subgradient methods for structured prediction is "(Online) Subgradient Methods for Structured Prediction" by Ratliff et. al.
I found the learning rate to be a bit hard to tune, though. I guess this method is a gain better for many examples. Please note that I implemented a batch version, not an online version. Though this is trivial to change.
Any comments would be very welcome :)
> Therefore, the sum of all alpha that correspond to a single example is bound by C.
ReplyDeleteDon’t you implement the 1-slack variant?
No, I use one slack per example. This is the standard version from the original paper.
ReplyDeleteTo elaborate: I am not very interested in the optimization problem itself. I want to learn in a setting with approximate inference, where the inference is potentially much, much more expensive than the SVM optimization. As I said in my last post, I even start the SVM optimization completely anew in each iteration. That doesn't really matter that much.
ReplyDeleteThe idea was more to have a flexible, easy to understand system. This is in no way an approach to be faster than the established software.