That looks something like this:
So I wanted to have some a bit more general data set generator.
What I ended up doing is a nonparametric mixture of Gaussians.
While Gaussians are a bit boring, combining them with a non-parametric prior makes them somewhat more general.
As I didn't found some very easy to use package to do that (though David pointed out pymc) I went ahead and wrote the generative model down myself.
It's a mixture of Gaussians with a Chinese restaurant process as prior for the mixture components and Wishard-Gaussian priors for mean and variance.
You can find the code here.
With this class, you can generate a dataset by:
dpgmm = DPGMMSampler(alpha=10., deg=10, sigma=3, n_features=2) X = dpgmm.sample(n_samples=100)Where
alphais the parameter of the Chinese restaurant process,
degis the degrees of freedom of the (assumed diagonal) Wishart prior and
sigmais the (diagonal) standard-deviation of the Gaussian prior over means. Here are some examples of what
Xmight look like given the above parameters.
Xare blue dots.
The code was pretty straight-forward (although it's not as fast as it could be I guess), except for drawing from the Wishart distribution. That was a bit annoying.
I got the idea how to do it from this. It would be great if scipy could integrate something similar in the future.
Btw, I have not really proved correctness of my code, as the main point was to generate some nice samples. If you need to know that the model is correct, you might want to check the above reference and the code ;)