Using PredictionIO with Salesforce Einstein to Recommend Complementary Product Purchases (Part 1)

At Slalom, as a Salesforce Platinum partner, we are always seeking to bring innovative solutions to our clients here in Phoenix. The team at Slalom is very excited about the possibilities of Salesforce’s latest offerings incorporating Artificial Intelligence (AI) into its app platform.

We recently had the opportunity to demonstrate the art of the possible with a client and wanted to share our knowledge with the Salesforce community.

The purpose of this guide is to help you get started using PredictionIO to build a prediction engine that can be used with Salesforce Einstein. You will learn how to use PredictionIO to recommend complementary items which customers frequently buy at the same.

General information about PredictionIO can be found at: https://predictionio.incubator.apache.org/start/

This guide was adapted from a 2016 Salesforce Developer workshop that explains how to implement a PredictionIO engine using the Similar Products template. This guide can be found at: http://sfdcworkshop.com/workshop/prediction-io/recommendation_engine/index_heroku/

Here, we will expand upon this by demonstrating how to swap different templates into the engine and how to call the engine to make predictions from Salesforce. In this example, I will use the Complementary Purchase template, but will explain how to substitute in other templates as well. A list of available engine templates can be found at: https://predictionio.incubator.apache.org/gallery/template-gallery/

I completed these steps using bash in Ubuntu, so using a different operating system may require some adjustments. It will also be helpful, but not required, for you to have a basic familiarity with Apex, Heroku, Salesforce, web services, bash, and SQL.

Before you start, you will need to have Git, and the Heroku-CLI installed.

The above image shows a general outline of what we will accomplish. We will run 2 applications on Heroku: pio event server and pio-engine. The event server will be responsible for managing the training data, and the engine will train itself on that data and use it to make predictions. Salesforce will act as the client here and send new training data to the event server database and request predictions from the recommendation engine.


Event Server

First, we will clone the repository the contains the event server code. Open a Unix terminal, and enter the following into the directory of your choice:

git clone https://github.com/mjollySlalom/pio-eventserver-heroku.git

Navigate into the folder that was created when we cloned the repository. Create a new Heroku application and push it to the server. In this example, we will call the application “events-mj”.

cd pio-eventserver-heroku
heroku create <AppName>
git push heroku master

If you encounter an error here, make sure that you are using the latest version of the Heroku-CLI and retry the above steps.

wget -q0- http://cli-assets.heroku.com/install-ubuntu.sh | sh

Check the database url for the application that was just pushed to Heroku. Make a note of this for later.

heroku config

Create a new Heroku App. This will allow us to tie events from this application into the Prediction Engine that we will create next. Make a note of the App name you select (“EventApp” in this example) and the Access Key that is generated.

heroku run console app new <AppName>

Now, we are ready to populate the server with Data. We will accomplish this by executing some code in Apex. Create a new Apex class in Salesforce called PopulateTestData.cls,then copy / paset the following:

public class populateTestData {
public static final String ACCESS_KEY = 'XXXXXXXXX';
public static void populate() {
DateTime closedTime = DateTime.now();

//Create 10 baskets of 10 items each
for(Integer baskets=0; baskets<5; baskets++) {
for(Integer i=0; i<10; i++) {
closedTime.addMinutes(5);
String closedTimeStr = closedTime.format('yyyy-MM-dd\'T\'HH:mm:ss.SSSX\':00\'');
//Send the api call for each item in this basket
HttpRequest req = new HttpRequest();
req.setMethod('POST');
req.setEndpoint('http://events-mj.herokuapp.com/events.json?accessKey=' + ACCESS_KEY);
req.setHeader('Content-Type','application/json');
req.setBody('{'
+ '"event" : "buy",'
+ '"entityType" : "user",'
+ '"entityId" : "user",'
+ '"targetEntityType" : "item",'
+ '"targetEntityId" : "item-' + i + '",'
+ '"eventTime" : "' + closedTimeStr + '"'
+ '}');
Http http = new Http();
HTTPResponse res = http.send(req);
}
}
}
}

Be sure to change the value of ACCESS_KEY to the value of the access key you generated for your database earlier.

This class uses REST to make a series of Post requests to the Event server that we just created. For the purpose of this example, we will make 5 different shopping baskets with 10 items each. Please note that the format of the data generated by this code is suited for Prediction Io’s Complementary Product Engine Template only. Different engine templates will require the data to be in different formats, which is usually described in the documentation provided with the template. If you plan to use a different engine template, you will need to adjust the Post request body accordingly. Details about the required format can be found in the documentation for the engine template you are using.

Before this code can be executed, you will need to add the event server Url to the whitelist of remote connections in your Salesforce org. Click “Setup”, then search for “Remote Site Settings” in the quick search box. Then click new, and add the Url for the server you created:

Use Apex Anonymous to run the populate() method in the class you just created.

In order to verify that everything worked, and your database is no longer empty, go back to the Unix Terminal, and type:

heroku pg:psql 

to connect to your event server database. Entering

\dt

will display a list of tables in this database. With this information, you can execute queries and interact with your data as shown below.

heroku pg:psql
SELECT * FROM pio_event_1;

If everything was set up correctly, you will see all of the data that you generated by running the method we created in Anonymous Apex.


Prediction Engine

Now we need to clone the Engine Template repository. For this exercise, we will be using the Complementary Purchase template, but you can substitute any other template with minimal reconfiguration, which will be covered later.

git clone https://github.com/PredictionIO/template-scala-parallel-complementarypurchase.git complementary-template

Next, we will clone the Build Pack and everything else that is needed to deploy this engine to Heroku. This respository was forked from the Salesforce Developer’s Workshop, and has been modified to meet our needs here.

git clone https://github.com/mjollySlalom/pio-engine-heroku.git

Once the download is complete, look in complementary-template/src/main/scala. There are 5 scala files that need to be copied into pio-engine-heroku/src/main/scala. These files are Algorithm.scala, DataSource.scala, Engine.scala, Preparator.scala, and Serving.scala. If you are using a different Engine Template than the Complementary Purchase template, the files may have slightly different names.

The image below shows the different parts that make up a prediction engine. The application data on the left is stored on the event server, and the prediction results on the right are processed by Salesforce. In the middle are the prediction engine components, which are contained in the files we just copied. The pio-engine-heroku git repository (which we downloaded in the first step of this section) contains the general build tools and infrastructure to run an arbitrary prediction engine on Heroku. However, it is missing the data source, preparator, algorithms, serving layer, and evaluation metrics, which are specific to a particular template. That is why we needed to copy in those files in the previous step.

https://predictionio.incubator.apache.org/customize/

***Skip this section, if you are using the Complementary Purchase Engine***

If you are using an engine template other than Complementary Purchase, you will need to update some code. In the file Algorithm.scala, look for a class called AlgorithmParams. These parameters will differ depending on the Engine Template you are using. Take note of the names of the parameters, because we will need to ensure that each one is set correctly by the training application. The parameters for Complementary Purchase are shown below:

Next, open TrainApp.scala, and look for this section where these parameters are being instantiated:

Update this line so that each Parameter required in Algorithm.scala is being defined correctly here. More information about the parameters can be found in the documentation for the template you are using. Also, selecting the best values for each of these parameters is critical to achieving the best performance, but is outside the scope of this article.

*** END SKIP ***

Before we can train our engine, we need to set an environment variable that will be used by the training application. Make sure that the APP_NAME matches the name you chose, when you configured the Heroku app. In this example, we used “EventApp”.

export APP_Name=<AppName> 

Commit the changes you made to your local branch.

git add .
git commit -m "Added engine template"

Then create your Heroku instance, and deploy the code to Heroku. In this example, we will name the instance “comp-engine-mj”.

heroku create <EngineName>
git push heroku master

By default, the new Heroku instance is attached to a new database. We can see the name of this database by typing

heroku addons

in the terminal. This will display the name of the database.

heroku addons:remove <database name>

will remove this default database.

Now we can link this application to the database we created and populated earlier that contains our training data. Use the database Url that you noted earlier.

heroku config:set DATABASE_URL=<DatabaseUrl> 

Finish configuring the Heroku app by setting the ACCESS_KEY, APP_NAME, EVENT_SERVER_IP, and EVENT_SERVER_PORT. The ACCESS_KEY and APP_NAME can be copied from your notes from earlier. The EVENT_SERVER_IP will be <Heroku event server app name>.herokuapp.com. In our case, events-mj.herokuapp.com. The EVENT_SERVER_PORT should be set to 80.

heroku config:set ACCESS_KEY=<AccessKey> APP_NAME=<AppName> EVENT_SERVER_IP=events-mj.herokuapp.com EVENT_SERVER_PORT=80

We are almost ready to train our prediction engine, but first we need to increase the memory allocated to our Heroku app.

heroku config:set JAVA_OPTS="-Xmx512m"

If you are using the free version of Heroku, you will need to scale down the dynos, then call the training application.

heroku ps:scale web=0 train=0
heroku run train

Since we only created a small amount of data for this example, training should finish quickly. Once it completes, rescale the dynos to bring the Rest server back online.

heroku ps:scale web=1 train=0

In order to verify that everything is working, open your browser and navigate to the url for the application you just created, which is in the form <Heroku engine app name>.herokuapp.com. If the app is running correctly, you should see a screen like this:


Making Predictions

Now we are ready to send queries to our prediction engine from Salesforce. As before, we need to add this URL to the whitelist in our Salesforce org. Go back to Settings > Remote Site Settings, and click “New Remote Site”. Add the URL for the engine app you just created.

Create a new Apex Class called TestPrediction.cls, and copy/paste the following code:

public class TestPrediction {

public static void predict(String str) {
//Make the API call to the recommendation engine
HttpRequest req = new HttpRequest();
req.setMethod('GET');
req.setEndpoint('http://comp-engine-mj.herokuapp.com/queries.json');
req.setHeader('Content-Type','application/json');
req.setBody('{ "items": ["' + str + '"], "num": ' + 20 + ' }');
Http http = new Http();
HTTPResponse res = http.send(req);
System.debug(res.getBody());
}
}

This class makes a GET request to our Prediction engine whose body contains JSON, for example:

'{ "items": ["itemName"], "num": ' + 20 + ' }'

This will return a list of no more than 20 items that are most likely to be purchased at the same time as the item “itemName”. If you are using a different Engine Template than Complementary Purchase, however, you will need to consult the template documentation to determine how you need to format this request.

Finally, we can use Anonymous Apex to call the predict() method in the class we just created to see the list of predictions. We will query for “item-0”, since it exists in the test data that we generated.

The prediction engine responds with a list of complementary items with scores in valid JSON that can be used in Salesforce.

In my next article, I’ll continue this guide with Part 2, where we will discuss how to handle the above JSON to display these recommendations in Salesforce. We will also create code to automatically load new records into the event database, allowing the AI to learn over time.


Interested in learning more about Salesforce Development? Be sure to follow me here on Medium, connect on LinkedIn, or check out Slalom Phoenix.