Optimize your business with machine learning

Or Koren
ironSource Tech Blog
4 min readJun 13, 2019

As machine learning becomes more and more popular, computer power is getting more affordable and easily accessible, thus replacing manual assessments with more efficient automated algorithms. Making it the best tool on the market to predict your business’s KPIs.

Here at ironSource, we use many different machine learning models to optimize our mobile app advertising business, such as showing the right ad to the right user at the right time.

Let me take you on a journey that will help you succeed in your business.

Start Small

In order to succeed, you must start small. Don’t try to build a model with tons of features and complex algorithms.

The Right Project

Understand what your business needs and choose a project that has as few iterations of optimization as possible, without machine learning. Choose a project that leaves plenty of room for the business to grow.

In short, make sure the first model is a quick win.

80/20

You can build a simple model in no time and achieve almost 80% of your goal. Choose a simple supervised algorithm like Regression or Random-forest and focus on the outcome comparison, with and without the model.

Set Clear Goals

After you’ve chosen the right project, you will need to set up a reasonable KPI for the first model.

Here’s a scenario to make sure this is clear, as this part is very important.

Let’s assume our KPI is to improve CTR (click through rate) of a certain ad and the current CTR is around 10%. Even with the best model, you won’t be able to reach more than 3x the current CTR, so the KPI for this case must be around 18% CTR (which is an 80% improvement) for the first model.

Why? Creating a model that predicts CTR won’t change the average CTR, which will stay at 10%.

The only benefit of using the model is to predict which users are above and below 10%.

For example, out of 100 users, only 10 clicked on the ad. And if we divide those same 100 users to 20 groups, the graph will look like this:

And your first model will look something like this:

The model can predict users that will have up to 24% CTR. So by choosing the top users only, your KPI will stand.

The Future is Here

One of the major problems in machine learning projects is not the models themselves, but the data preparation.

Prepare yourself for the future, by keeping all your RAW data accessible, compatible with all technologies, and all in one place.

Data Lake

I strongly recommend saving all RAW data in the data-lake (S3) in parquet form, a column oriented file format — since storage costs are cheap and it works perfectly with Presto, Spark & Hadoop ecosystem.

By having the data all in one place, you can build thousands of ELT logics (views), creating a business logic (in SQL) on RAW data representing aggregated data.

Why ELT?

  • In today’s big data world, we don’t need to create ETLs, moving data from one place to another, as the cost and speed of extracting the RAW data is relatively cheap and incredibly fast.
  • Data science teams can easily copy the ELT logic (the data developer’s vision for the business) and create a user-base dataset to work with, ultimately reducing their work time by 80%!
  • Easing deployment of production — refer to the next section

Deployment

Deploying your models can be a tricky, painful, and time consuming task.

I strongly recommend using MLFlow, a great solution that works with almost all Python libraries and helps you track, save models in a project, and deploy them to the REST-API.

Stay Tuned

There are many companies today that provide a platform for machine learning developers that can help you succeed, such as SparkBeyond, DataRobot, Parallelm & Firefly.

And of course, Google GCP and Amazon AWS are investing a lot of effort with ML services to help you integrate ML into your products much easier.

--

--