We are pleased to introduce a new open source proejct today. It's another machine learning library using hadoop besides the mahout of ASF(Apache Software Foundation). The name of this project is redpoll, which means any of several small finches of northern North America and Eruasia, having a red crwon and black chin. We hope our project will grow agilely like these kinds of birds and we intent to parallelize some traditional classification, clustering algorithms like Navie Bayes, K-Means, EM using apache's hadoop and compare the speed up on data sets of various size. It's Apache 2.0 licensed.
We are two guys in school whose interests are parallel computing. Since last year, we began to pay attentions to hadoop, which exerts quite an attraction to us. And we have been looking for opportunities to work on this implementation of MapReduce for a long time and meanwhile implemented some traditional machine learning algorithms in c/java. Last year we accidentally found the article
Map-Reduce for Machine Learning on Multicore
by University of Stanford, which perfectly matches our interests. Later we learnt that the ASF has begun to implement a machine learning library called mahout, which is according to the thoery of that article, and planned to invite students to participate the google summer project 2008 to finish this job.
But our english are quite poor that we may not obtain any chance, so we decided to launch this new project to let our dreams come true.
for more informations please check out http://code.google.com/p/redpoll
redpoll is launched