Machine Learning Toolkits

November 08, 2010

Wow. So much to read today.
While following link upon link, I found so many great toolkits that I think it is worth listing them here.
One of the greatest sources was the GNU/Linux AI & Alife HOWTO.

[edit] It's been a while since I wrote this blog post but still many people seem to find it, so here a quick update. After looking into many libraries, I started using scikit-learn and then using it exclusively. Now I am a regular contributor. It is a fast growing project with great documentation resources, many algorithms and it is just so easy to use. Also, working with Python and the Python crowd is fun. I heartly recommend it. [/edit]

Here goes:

Vowpal Wabbit - project on very fast online gradient descent by Yahoo research (C++)
VFML (Very Fast Machine Learning) - library for very fast decision trees and Bayes networks (C++)
Stochastic Gradient Descent - library for SVMs with stochastic gradient descent (C++)
Maximum Entropy Modeling Toolkit for Python and C++ - the name says it all
Elefant - toolkit that includes kernel methods, optimization strategies and belief propagation. It has a gui
Milk - toolkit for python that includes SVMs, decision trees, kNN, PCA, Kmeans, NMF and feature selection
Peach - pure Python library that includes neural networks, fuzzy logic, genetic algorithms and swarm intelligence
Pebl - python library and command line application for learning the structure of a Bayesian network
Machine Learning: An Algorithmic Perspective - Actually a book. But with MANY MANY MANY examples online. All in Python. MOST AWESOME! - I just ordered the book
dbacl - a digramic Bayesian classifier - a collection of command line tools for Bayesian classification particularly for spam filtering
Shark - Modular library including neural networks, kernel methods, discrete and continuous optimization, fuzzy logic and control and mixtures density models (C++)
PyMVPA - python module including more classifiers, regression and feature selection methods than can be listed here. Do a cross-validated classifier sweep and parameter search in < 10 lines of python.
Monte - gradient based learning in Python - Python module that contains neural networks, Kmeans, logistic regression with a focus on parametric models
scikit-learn - python module with good API. Includes SVMs, generalized linear models, gaussian mixture models, mean-shift, feature selection and ranking and data management and many more.
mlpy - Python module that includes Wavelet transforms, Kernel methods, FDA, PDA, LASSO, LARS, feature selection and ranking and data management. Very clean interface.
Modular toolkit for Data Processing - Python toolkit for data processing. In my opinion the API needs a little getting used to. Includes PCA, Kmeans, RMBs, FastICA, Neural Gas, SVms, Perceptrons and many more.
Orange - Data mining through visual programming or Python. Large toolbox that includes great visualization features, classifiers, data management, regression and clustering. Definitely worth trying.
Weka - A classic tool for all data mining. Contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Can be used via interface, scripting or java.

Comments

Yaroslav BulatovNovember 8, 2010 at 8:39 PM
There's also RapidMiner, PyBrain, Apache Mahout, LibLinear, and that's just from the first couple of pages of http://www.delicious.com/tag/machinelearning
ReplyDelete
Replies
Leo BakkerJanuary 26, 2014 at 3:38 PM
Thanks for this overview
ReplyDelete
Replies
AnonymousMay 14, 2014 at 3:08 PM
Thanks for your note about scikit-learn.
I'd like to test this library for a bayesian network work but I have some difficulties to create a simple bayesian network. Do you know where I could find an example of a simple implementation using scikit-learn?
ReplyDelete
Replies
Andreas MuellerMay 14, 2014 at 7:13 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownApril 4, 2015 at 12:07 PM
http://scikit-learn.org/stable/modules/naive_bayes.html
@Stig
ReplyDelete
Replies

Add comment

Search This Blog

Peekaboo

Machine Learning Toolkits

Comments

Post a Comment

Popular posts from this blog

Machine Learning Cheat Sheet (for scikit-learn)

A Wordcloud in Python

Kernel Approximations for Efficient SVMs (and other feature extraction methods) [update]

MNIST for ever....

Python things you never need: Empty lambda functions