Let me start off by saying I am a complete newb when it comes to machine learning, advanced algorithms, and things of that nature.
I’m a web developer for Pete’s sake. (Sorry for taking your name in vain Pete.) I don’t do that sort of thing. I make sure my widgets wobble and my gidgets do whatever those do.
But I was recently tasked with doing some comparisons between Google’s Prediction API and IBM’s Alchemy APIs for sentiment analysis. Since they both seemed relatively easy to play with, I jumped in head first.
First, the verdict. With Google’s API I could train my own models (which meant that I could have more contextual results). With Alchemy, I was forced to use their models. Which is fine if you don’t want to do any sort of classification training things.
In short, because I had the data already, Google’s API was far more accurate. I think I had about an 86% accuracy rating if I recall.
Anyway, that said, let’s dive into what I did.
As mentioned, I had my own dataset to work with. It looked like this:
Just a simple CSV file.
The first column is the result and the second is what you’re classifying.
You’re basically saying — classifierFunction(column) === column
Keep in mind that the larger the dataset, the better your results. But, as it turns out, the API has limits. I tried to train it on a 1.3 million row file today and it timed out :(
C’est la vie.
Once you’ve got your model ready to rock, go install this npm package.
A small CLI tool that will allow you to easily insert and check the status of training models with the Google Cloud…www.npmjs.com
It’s a little CLI tool I built to make training models simpler.
Setting Up Your Model & CLI Tool
So, you’ve got your file.
Next step is to upload your file to Google Cloud Storage.
Once you’ve gotten to storage, you’ll want to create a bucket and upload the file. No special permissions are required.
One last step and you can train your model.
For the CLI tool to work, you need to set an environment variable called GOOGLE_APPLICATION_CREDENTIALS that’s just the path to your Google credentials .json file. You can create one of those when activating the Google Prediction API in your console.
(In linux/osx, add something like the following to ~/.bash_profile)
Now you’re ready to rock.
Training Your Model
I’ll be frank. Google’s APIs are the pits. The docs are incomplete and figuring out what to pass to the methods in their SDK is not intuitive in the least.
So, let the CLI tool do the heavy lifting for you.
Once you’ve set everything up, just run the following to train your model.
This uses four flags.
This tells the tool you’re wanting to — well — insert and train a new model
This is the project name in Google Cloud
This is what you’re naming the model. Make it easy to remember.
This is a path to the file you uploaded earlier. It’s the bucket + the filename.
Once you’ve inserted it, you can view the status of the training at any time.
That’s it! Now you’ve got a trained model that you can use to predict classifications with.
Let’s use it for sentiment analysis now.
Let’s imagine we have a tweet we want to analyze to see if it’s actionable.
We’re going to need to make a request to the Google Prediction API and provide them with our model data and the text we want to use.
If you’re fancy like me, you’ll stuff the nasty callbacks here into promises, but I think this will give you the idea of what’s going on.
And that’s it! Note that your results will live in the ‘outputLabel’ property on the res object.
They don’t really make that clear, but that’s where your results are.
Doing machine learning with cloud services like these is pretty damn easy. Thanks Google!
Also to note, I’m considering adding functionality for the last bit into the CLI tool. I just haven’t gotten around to that yet. If anyone wants to contribute to making it better, feel free to toss in a pull request and we can be on our way!
Thanks for reading and leave a comment if you’ve got any feedback or questions.