Machine Learning for everyone – PredictionIO

PredictionIO (http://prediction.io/) is an open source machine learning (ML) server. Its goal is to make personalization and recommendation algorithms more accessible to programmers without ML knowledge. It includes recommendation engine and similarity engine which can be instantiated, configured and evaluated via web-based GUI.

Due to a limited number of integrated ML methods I do not think this product should be already called “machine learning server“. As I was curious how does the system work, I tested it. Therefore in this post I review how to install and use the server.

1. Installation

First we need to install the server and its dependencies. I was using Mac OSX Mavericks  (10.9, GM):

  • We need to install MongoDB (http://www.mongodb.org). Currently, version 2.4.6 was available. To run the database, we need to create a db folder and run the service
$ mkdir /data/db
$ ./mongod
  • Then we need to download Apache Hadoop (http://hadoop.apache.org/) and add it to PATH.
  • The Prediction IO server and MongoDB connector should be installed as follows in the getting stared guide:
git clone https://github.com/mongodb/mongo-hadoop.git
cd mongo-hadoop
git checkout r1.1.0
./sbt publish-local
 
git clone https://github.com/PredictionIO/PredictionIO.git
cd PredictionIO
bin/build.sh
bin/package.sh

2. Run the server

After we packaged the distribution, we can run the server from dist/target/PredictionIO-<version>. First we need to run the setup script ./bin/setup.sh and then run it ./bin/start-all.sh.

The server is accessible only to registered users, which can be added using the following command ./bin/users. After that, we can login to the server via the default port: http://localhost:9000/.

Screen Shot 2013-10-21 at 14.34.26

Later, if we see the message “This feature will be available soon.”, we need to run the setup script again and restart the server.

3. Write an example application

Firstly, we create an application. The result of this step is an App Key, which is used for our script. Then we create an engine – we chose recommendation engine. We need to define item types and some basic recommendation parameters. Afterwards we select a recommendation algorithm and set its parameters.

The main idea is to have a set of users and a set of different items to predict new items for new or existing users.

a

b

c

d

e

f

g

 

Secondly, we need to populate the database via our program and then we can call functions to get new predictions. We published our sample code on GitHub (https://github.com/szitnik/prediction-io-Test). The key idea was to have 4 users and their friendships (modelled as view action) to predict new possible friendships.

h

i

j

After we inserted the data, the system calculated all possibilites and stored them into the MongoDB database:

13/10/21 16:22:50 INFO mongodb.MongoDbCollector: Putting key for output: null { "uid" : "1_XWoman" , "iid" : "1_xwoman" , "score" : 0.8630746441103142 , "itypes" : [ "person"] , "algoid" : 2 , "modelset" : true}
 13/10/21 16:22:50 INFO mongodb.MongoDbCollector: Putting key for output: null { "uid" : "1_XWoman" , "iid" : "1_mirco" , "score" : 0.7472373332042158 , "itypes" : [ "person"] , "algoid" : 2 , "modelset" : true}
...
 13/10/21 16:22:50 INFO mongodb.MongoDbCollector: Putting key for output: null { "uid" : "1_Mirco" , "iid" : "1_jurcek" , "score" : 0.8126793796567733 , "itypes" : [ "person"] , "algoid" : 2 , "modelset" : true}
 13/10/21 16:22:50 INFO mongodb.MongoDbCollector: Putting key for output: null { "uid" : "1_Mirco" , "iid" : "1_mirco" , "score" : 0.6338884061675459 , "itypes" : [ "person"] , "algoid" : 2 , "modelset" : true}
 13/10/21 16:22:50 INFO mongodb.MongoDbCollector: Putting key for output: null { "uid" : "1_Mirco" , "iid" : "1_xwoman" , "score" : 0.5107068611450168 , "itypes" : [ "person"] , "algoid" : 2 , "modelset" : true}
...
 13/10/21 16:22:50 INFO mongodb.MongoDbCollector: Putting key for output: null { "uid" : "1_Johan" , "iid" : "1_mirco" , "score" : 0.9135908855406835 , "itypes" : [ "person"] , "algoid" : 2 , "modelset" : true}
 13/10/21 16:22:50 INFO mongodb.MongoDbCollector: Putting key for output: null { "uid" : "1_Johan" , "iid" : "1_johan" , "score" : 0.7490173941424351 , "itypes" : [ "person"] , "algoid" : 2 , "modelset" : true}

After that we ran some queries to get new friends recommendations:

Retrieve top 1 recommendations for user Mirco
Recommendations: jurcek
Retrieve top 1 recommendations for user XWoman
Recommendations: mirco

The web-based GUI also supports some parameters tuning and evaluation methods for recommendation algorithms.

To conclude, I believe the PredictionIO project is a nice start to bring ML (although I do not agree with ML naming here :)) methods closer to a large group of programmers.